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Abstract 

This  paper  develops  a  theory  of  high  and  low  (extremal)  quantile  regression:  the 
linear  models,  estimation,  and  inference.  In  particular,  the  models  coherently  com- 
bine the  convenient,  flexible  linearity  with  the  extreme-value-theoretic  restrictions  on 
tails  and  the  general  heteroscedasticity  forms.  Within  these  models,  the  limit  laws  for 
extremal  quantile  regression  statistics  are  obtained  under  the  rank  conditions  (exper- 
iments) constructed  to  reflect  the  extremal  or  rare  nature  of  tail  events.  An  inference 
framework  is  discussed.  The  results  apply  to  cross-section  (and  possibly  dependent) 
data.  The  applications,  ranging  from  the  analysis  of  babies'  very  low  birthweights, 
(5,  s)  models,  tail  analysis  in  heteroscedastic  regression  models,  outlier-robust  infer- 
ence in  auction  models,  and  decision-making  under  extreme  uncertainty,  provide  the 
motivation  and  applications  of  this  theory. 
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1      Introduction 

Regression  quantiles,  Koenker  and  Bassett[48],  represent  a  flexible  and  informative 
method  of  regression  analysis  as  they  describe  the  conditional  distribution  of  the  re- 
sponse variable  Y  given  covariate  X ,  without  imposing  rigid  distributional  assumptions. 
The  goal  of  this  paper  is  to  model  and  make  inference  on  the  extremal  {near- extreme 
high  or  low)  regression  quantile  functions.  In  essence,  they  represent  the  models  of 
the  extremal  values  of  Y  conditional  upon  X.  For  example,  the  near-extreme  0.1-th 
conditional  quantile  function  describes  the  values  below  which  Y  falls  with  probability 
10%  given  values  of  X. 

Modeling  high  or  low  conditional  quantiles  is  motivated  by  many  examples.  Some 
include:  (i)  in  micro- economics:  (5,  s)  models  of  investment,  inventory,  employment 
shortages;  auction  models,  reservation  wage  equations;  (ii)  in  finance,  micro-  and 
macro- economics:  decision-making  under  extreme  uncertainty,  where  good  risk  mea- 
sures are  vital  for  the  purposes  of  insurance,  safety-first  resource  allocation,  and  man- 
agement of  risks;  and  many  others. 

The  ordinary  extremal  quantiles,  the  models  and  the  sample  analogs,  have  been  the 
main  subject  of  classical  and  modern  extreme  value  theory,  which  forms  an  important 
field  of  applied  and  theoretical  statistics.'^  The  theory  was  developed  by  Von  Mises, 
Prechet,  Fisher,  Gnedenko,  Smirnov,  de  Haan,  and  many  others.  The  ordinary  sample 
quantiles  have  an  immense  inference  role,  providing  the  estimators  of  the  tail  index 
and  other  tail  functionals  (Pickands[57],  Hill[40],  Dekkers  and  de  Haan[21]).  Analogous 
motivations  underlie  the  present  analysis  as  well. 

In  this  paper  we  study  the  extremal  (high  and  low)  conditional  quantiles  -  the  linear 
models  and  the  sample  regression  analogs.  In  particular,  the  models  coherently  combine 
convenient,  flexible  linearity  with  the  extreme- value-theoretic  restrictions  on  tails  and 
the  general  heteroscedasticity  forms.  Within  these  models,  the  limit  laws  for  extremal 
quantile  regression  statistics  are  obtained  under  the  rank  conditions  constructed  to 
reflect  the  extremal  or  rare  nature  of  tail  events.  The  goal  is  the  practical,  important 
problem  of  modeling  and  making  inference  on  the  .3-th  and  lower  and  .7-th  and  higher 
regression  quantiles,  as  well  as  conducting  the  tail  inference,  in  the  common  economic 
data  sets.^  {Our  target  is  not  the  "exotic"  0-th  quantile.) 

The  rank  conditions  approximate  the  degrees  of  lack  of  data  or  extremality  pertinent 
to  the  inference  about  the  quantiles  of  interest.  Define  rank  r  as  the  quantile  index  r 
times  the  sample  size  T.  The  extreme  and  intermediate  rank  conditions  apply  to  the 
cases  where  index  t  is  extremal  (e.g.  .1,  .2)  and  is  low  (r  is  small)  or  not  low  {r  is  large) 
relative  to  the  sample  size  T.^  Formally,  the  sequence  of  the  quantile  index-sample  size 
pairs  {tt,T)  is  an  extreme  rank  sequence  if 

(i)     Tr  \  0,    TrT  -^  k>  0, 

and  an  intermediate  rank  sequence  if 

{ii)      T-r  \  0,     TtT  — >  GO. 

-As  of  today,  thousands  of  papers  are  devoted  to  the  extreme  value  theory.  Many  excellent  books 
give  systematic  treatments.  See  e.g.  [7],  [64],  [65],  [52], [33], [34], [26],  [73]. 
^Say  with  T  <  3000,  and  typical  number  of  regressors,  5—10  and  higher. 
"•Heuristically,  r  is  number  of  observations  to  make  inference  on  the  r-  regression  quantile. 


Because  the  principles  (i)  and  (ii)  constructively  exploit  that  the  relevant  data  is 
formed  by  the  tail  events  and,  or  is  scarce,  they  lead  to 

a.  asymptotic  distributions  that  either  fit  the  finite-sample  distributions  better  or 
are  more  parsimonious  than  the  conventional  approximations, 

b.  important  tail  inferences,  based  on  the  extremal  regression  quantiles. 

In  evaluating  these  concepts,  it  is  important  to  keep  in  mind  that  these  alternative 
sequences  are  designed  to  yield  better  approximations  in  given  practical  problems  with 
given  sample  sizes,  even  when  the  quantile  index  is  not  very  low.  And,  in  case(a),  it  is 
completely  irrelevant  whether  or  not  future  sampling  will  lead  to  samples  conforming 
to  these  sequences  or  not. 

The  concepts  (i)  and  (ii)  are  well  motivated  by  the  intellectual  and  practical  success 
of  the  extreme  value  theory,  which  focused  on  the  ordinary  sample  quantiles.  The  con- 
cepts are  also  similar  in  spirit  to  other  "alternative"  asymptotics,  e.g.,  GMM  when  the 
number  of  moment  conditions  is  large;  weak  instruments  theory,  where  the  instruments 
are  weakly  correlated  with  the  regressors;  near-to-unit  root  theory;  and,  generally,  the 
theory  of  statistical  experiments. 

The  organization  and  contribution  of  this  paper  is  eis  follows: 

1.  Section  2  demonstrates  the  relevance  of  the  problem  in  economic  analysis. 

2.  Section  3  introduces  the  principles  of  extremality  for  the  regression  quantiles  - 
namely  that  of  the  intermediate  and  extreme  rank  sequences. 

3.  Section  4  develops  the  models  of  the  extremal  (low)  conditional  quantiles.  They 
coherently  combine  the  linear  functional  forms  with  the  extreme-value-theoretic  re- 
strictions, and  lead  to  non-degenerate,  parsimonious  limit  distributions.  The  models 
are  distribution-free  and  flexible,  allowing  for  sophisticated  effects  of  covariates  on  the 
shape  of  the  conditional  distribution  (the  scale,  kurtosis,  skewness,  etc.).  Importantly, 
these  models  do  not  admit  reductions  to  the  classical  one-sample  case  (by  removing  the 
conditional  mean  and/or  scale). 

4.  Within  the  formulated  models,  section  5  provides  the  asymptotic  limit  theory 
for  the  sample  regression  quantiles  under  the  extreme  rank  condition,  tT  — )■  fc  >  0.^ 
The  Hmit  is  driven  by  a  stochastic  integral  of  a  "residual"  function  with  respect  to  a 
Poisson  point  process. 

5.  Section  6,  using  additional  tail  restrictions,  provides  the  asymptotic  distributions 
of  regression  quantiles  under  the  intermediate  rank  condition,  tT  — >  oo,r  -^  0.  The 
limit  is  normal,  with  variance  parsimoniously  determined  by  the  tail  indices.  This 
enables  a  very  practical  inference.  (In  contrast,  the  conventional  theory  requires  the 
nonparametric  estimates  of  the  conditional  density  functions  evaluated  at  the  extremal 
quantiles).  This  provides  a  regression  analogue  of  fairly  recent  results  of  Dekkers  and 
de  Haan[21]. 

^This  paper  is  not  about  t  =  0,  the  linear  programming  estimator  (also  called  'extreme  regres- 
sion quantile'),  considered  in  Feigin  and  Resnick[29],  Portnoy  and  Jureckova[60],  Chernozhukov[12], 
Knight[47]  within  the  location-shift  model  (Covariates  only  affect  the  location  but  not  the  scale,  shape 
or  tail  of  the  conditional  distribution.)  The  estimator,  defined  as  max  A''/3  s.t.  Yt  <  Xtf),Vt  and  useful 
as  a  boundary  estimate,  can't  be  used  at  all  in  the  present  context.  We  look  at  different  estimators 
(high  and  low  regression  quantiles)  that  have  very  different  asymptotics  and  applications  [t  >  0  (T  is 
finite);  see  examples  2.1-2.4,  where  the  support  is  unbounded  or  "boundaries"  depend  on  unobserved 
variables].  We  also  develop  and  operate  with  very  different  models. 


6.  We  conclude  by  discussing  an  inference  theory  and  an  empirical  paper  [15]. 

Also  relevant  are  the  works  of  Smith[73],  Tsay[75],  and  references  therein,  who 
develop  the  models  of  exceedances  over  high  constant  thresholds.  The  parametric 
likelihood  of  the  Pareto  family  is  used  to  describe  such  data,  and  parameters  are  made 
dependent  on  regressors.  It  should  be  clear  that  the  goals,  models,  and  methods  of 
this  paper  are  quite  different.  Our  analysis  should  be  viewed  as  complementing  the 
study  of  the  central  rank  regression  quantiles,  with  the  motivation  stemming  from  the 
wide  use  of  quantile  regression  in  data  analysis  in  econometrics  and  statistics. 

2      Econometric  Applications 

Quantile  regression  is  a  popular  tool  in  econometric  applications.  See  Abadie  et  al.[l], 
Buchinsky[9],  Chamberlain[ll],  Poterba  and  Rubin[61],  and  the  review  of  Koenker  and 
Hallock[50].  Our  results  can  be  useful  in  about  any  such  application,  since  our  focus  is 
the  inference  about  high  and  low  conditional  quantiles  (say  .7  and  higher  and  .3  and 
lower,  in  a  typical  data-set),  recognizing  the  extremality  and/or  scarcity  of  the  tail 
events.  Important  inferences  about  the  tail  shapes  can  be  made  as  well.  There  are 
many  examples,  where  high  or  low  quantiles  are  of  particular  interest.  For  example, 
Abreveya[2]  and  Koenker  and  Hallock[50]  characterize  the  economic  determinants  of 
babies'  very  low  birth-weights  through  the  near-extreme  conditional  quantiles  (.05  and 
below).  Deaton[20]  examines  food  expenditure  of  Pakistani  households  by  the  .1-th  and 
.9  -th  conditional  quantiles.  The  following  presents  a  brief  discussion  of  some  others. 

Example  2.1   (Determinants  of  Generalized  {S,s)  Models.) 

The  (5,  s)  theory  is  widely  used  in  the  firm-level  microeconomic  studies,  including 
the  analysis  of  durable  good  inventories,  employment  shortages,  and  investment  in 
capital  goods.  Lumpiness  of  adjustments,  a  main  prediction,  is  well  documented.  E.g., 
Arrow  et  al.  [5],  Scarf[71],  Rust  and  Hall[37],  Aguirregabiria[3],  Caballero  [10]. 

In  the  (5,  s)  theory,  a  firm  allows  a  state  variable  Vj  (capital  stock,  inventory)  to  fall 
until  it  reaches  a  lower  barrier,  s{Xt),  at  which  point  the  stock  is  replenished  (jumps) 
to  an  upper  barrier,  S{Xt).  Such  decisions  are  optimal  in  general  settings  (Hall  and 
Rust[38]).  Xt  may  include  prices  and  other  variables  that  affect  the  firm's  beliefs 
about  future  sales  and  costs  (e.g.  industrial  production  and  commodity  price  indices, 
interest  rates).  Assume  that  {Yt,Xt)  are  observed  for  a  cross-section  of  firms.  Absent 
unobserved  heterogeneity,  s{Xt)  and  S{Xt)  are  exactly  the  minimal  and  maximal  con- 
ditional quantiles  of  Yt,  given  A''^.  Otherwise,  s{Xt)  and  S{Xt)  are  still  strongly  related 
to  the  extremal  conditional  quantiles. 

To  address  the  unobserved  heterogeneity,  Caballero  and  Engel[10]  introduce  the 
stochastic  barriers  [s{Xt)  —  Vt,S{Xt)  +  et]  with  unobserved  time-  and  firm-  specific 
random  components  et,Vi.  They  propose  probabilistic  adjustment  models  (hazard 
functions)  to  describe  the  evolution  of  Yt  across  firms  and  or  times.  In  such  models,  the 
high  and  low  conditional  quantiles  also  describe  the  probabilistic  rules.  For  example, 
the  inventory  variable  Yt  is  below  the  .1-th  conditional  quantile  only  with  probability 
10%,  given  A'(.  Inference  about  such  functions  is  exactly  our  area  of  focus.  More 
generally,  we  can  map  quantile  functions  into  hazard  functions,  and  vice-versa. 

Apart  from  descriptive  analysis,  extremal  quantile  regression  can  estimate  the  de- 


Figure  1:  (S,s)  model  with  stochastic  bands  [s(.Y)  —  v,S{x)  +  e],  where  &  and  v  are  the 
unobserved  firm  and  time  specific  random  components.  Panel  (A):  Data  on  a  single  firm  may 
be  generated  by  discrete  sampling  from  the  time  path  of  (Yi,Xt).  E.g.  Rust  and  Hall[37]. 
Panel  (B);  Data  {Yi,Xi)  may  be  generated  as  a  cross-section  of  plants.  E.g.  [3], [10], 


terminants  of  the  {S{x),s{x))  functions.  Specifically,  suppose  that 

[s{Xt)-vt,S(Xt)  +  et] 

constitute  the  adjustment  barriers,  with  Vt,et  >  0,s(A'j)  <  S{Xt)  a.s.,  so  that  the 
interval  is  non-empty.  The  timing  is  continuous.  If  Yj  hits  the  lower  bound  s{Xt)  —Vt, 
it  is  adjusted  to  the  upper  bound  S{Xt)  +  et-  Pairs  {Yt,Xt)  are  the  observed  draws  of 
different  firms  (or  a  panel,  stationarity  assumed).  For  brevity,  let's  focus  on  s{X). 
Suppose  Vt  and  et  are  independent  of  Xt,^  then  for  c  >  0: 

P{Yt  <  s{Xt)  -  c\Xt)  =  EPiXt  -  s{Xt)  <  -c\Xt,c<  Vt)  ■  P(c  <  vt) 
+  EP{Yt  -  s{Xt)  <  -c\Xt,c>  Vt)  ■  P{c  >  Vt). 

By  construction  P{Yt  —  s{Xt)  <  —c\Xt,c  >  vt)  =  0.  Additionally,  impose  the  following 
tail  homogeneity  condition:  for  all  c  >  0  sufficiently  close  to  Uj:^ 


PiYt  -  s{Xt)  <  -c\Xt,c<  Vt)  =  a{vt  -  c). 


(2.1) 


Thinking  of  Yt  —  [s{Xt)  —  vt]  as  a  positive  "duration"  variable,  (2.1)  states  an  "ac- 
celerated failure  time"  model  for  the  tail  (which  is  more  general  than  in  [10]).  (2.1) 
imposes  no  restrictions  on  the  central  features  of  the  conditional  distribution  of  Yt, 
which  is  reasonable,  since  the  (5,  s)  theory  does  not  relate  the  central  features  to  the 
adjustment  barriers.  For  example,  the  symmetry  or  homoscedasticity  assumptions  are 
unreasonable.  This  implies  that  for  — c  low  enough  and  some  low  constant  (p{c) 

P{Yt  -  siXt)  <  -c\Xt)  =  (Pic), 

or,  equivalently,  that  for  small  r  >  0 

QY,{r\Xt  =  x)  =  s{x)  -  c{t) 


This  is  reasonable,  since  S{X),s(x)  incorporate  the  barrier  component  that  depends  on  X.  Nev- 
ertheless, we  can  allow  e,v  to  be  dependent  on  A'.  A  note  is  available  upon  request. 
^=  can  replaced  by  ~,  as  c  increases. 


is  the  T-th  conditional  quantile  of  Yt  given  Xt-  Therefore,  s{x)  equals  the  low  (extremal) 
conditional  quantiles  up  to  an  additive  constant.  Notably,  it  is  not  possible  to  estimate 
s(x)  off  the  central  features  of  the  conditional  distribution  of  Yt-,  as  discussed  above. 
The  inference  about  Qy{t\X)  for  low  values  of  r  is  exactly  our  area  of  focus.  The 
analytical  examples  of  Rust  and  Hall  suggest  that  linear/polynomial  functions  are 
excellent  descriptions  of  (s(x),5(x))  functions. 

Example  2.2  (Tail  Analysis  in  Regression  Models)  The  tail  shape  (index)  of  the 
conditional  distribution  is  important  in  the  regression  analysis.  For  example,  the  thick- 
tailed  distributions  favor  the  LAD  and  other  estimators  more  than  the  OLS.  Thus, 
knowing  the  tail  index  helps  determine  better  estimators.  On  the  other  hand,  the  tail 
shapes  are  important  in  describing  the  large  insurance  claims  ([26]),  the  analysis  of 
the  long  and  short  term  survival  and  durations  ([49],  [42]),  and  financial  data  (e.g. 
Mandelbrot[54],  Fama[28],  Kearns  and  Pagan[45],  Danielsson  and  de  Vries[17]).  In 
the  non-regression  setting,  the  tail  index  estimators  of  Hill  and  Pickands  have  been 
countlessly  used  in  the  empirical  analysis.  However,  estimation  of  the  tail  index  in  the 
presence  of  the  shape  heteroscedasticity  (scale,  skewness,  kurtosis,  and  other  forms)  is 
largely  an  open,  difficult  problem.  Our  results  allow  one  to  construct  the  regression 
analogs  of  the  Pickands  and  Hill  tail  index  estimators,  based  on  the  extremal  regression 
quantiles,  which  specifically  adapt  to  the  shape-heteroscedastic  setting,  and  are  simple 
in  practice.  Section  7  off'ers  a  discussion,  and  [15]  provides  an  empirical  application. 

Example  2.3  (Decision  Making  under  Extreme  Uncertainty)  Risk  is  a  key 
subject  of  non-financial  and  financial  decisions,  insurance,  and  regulation.  Both  the 
firms  and  the  regulators  are  seriously  concerned  about  extreme  risks  -  the  tail  events 
that  can  wipe  out  capital,  hindering  liquidity  or  solvency. 

An  important  branch  of  economics  literature  is  devoted  to  safety-first  decision  mak- 
ing. See  Roy[68],  Telser[74],  Pyle  and  Turnovsky[63],  Bertail  et  al[8]  and  others.  In 
this  approach,  the  decision-makers  (firms,  investors,  regulators)  solve  either: 

1.  max  z,  or      2.  max  /^(q), 

where  If  (a)  is  the  random  payoff  (e.g.  private  or  public  benefits  and  profits)  to  the 
decision  a  (technology,  portfolio  composition,  buffer  stocks,  quality/quantity  of  food 
control);  z  is  the  safety  margin  or  disaster  level  of  the  payoff;  r  is  the  probability 
of  the  disaster  or  of  exceeding  the  margin,  set  to  be  small;  fi  is  the  mean  of  ^''((q); 
(5y,(a)(T|Xi)  is  the  conditional  r-th  quantile  function  of  Yt{a)  given  .Y(,  the  vector  of 
variables  representing  the  current  state.  QYt(a)iT\Xt)  <  i  is  the  conditional  (extremal) 
quantile  constraint,  requiring  the  disaster  probability  to  be  small:  P{Yt{a)  <  z\Xt)  < 
T.  This  presents  a  problem  of  inference  concerning  the  conditional  extremal  quantiles. 
Our  models  are  flexible  (central  features  of  the  distribution  do  not  determine  the  tail 
features)  and  specifically  exploit  the  extremality  and  scarcity  of  the  tail  events. 

In  Chernozhukov  and  Umantsev  [15],  we  apply  the  present  results. 

Safety-first  decisions  are  very  important  in  the  finance  industry,  where  quantiles 
(value-at-risk)  are  the  required  measures  of  the  high  level  infrequent  risk,  used  to  de- 
termine the  capital  requirements  and  other  external  and  internal  purposes.  See  [25], 
[56],  [27],  [35],  among  others,  for  a  sample  of  illuminating  research  as  well  as  reviews. 
Value-at-risk  is  computed  as  the  level  below  which  the  (daily  or  weakly)  return  is  only 


1%  or  5%  of  the  time  (.01-th  and  .05-th  quantiles).  Again,  this  is  a  problem  concerning 
the  conditional  extremal  quantiles. 

Example  2.4  (Simple  Robust   Inference   in   Boundary-Dependent   Models) 

Parametric  boundary  dependent  likelihoods,  arising  in  the  models  of  job  search  and 
auctions  (see  [16],  [31],  [23],  [41],  [36]  for  a  sample  of  remarkable  works)  take  the  form: 

L{Pn)  -  ^in/(y<|A-,,7,/^)  •  l(lt  >  x[p), 

t 

where  /(A','/3|.Y(,7,^)  >  0  a.s.  and  is  finite.  /?  and  7  are  the  boundary  and  shape 
parameters,  respectively.  Linearity  of  the  boundary  is  not  essential  (see  below). 

Likelihood  procedures,  e.g.  ML,  estimate  7  and  /?  jointly.  The  estimates  $  are 
characterized  by  d  =  dim(A')  constraints,  Yt  =  X[p,  where  Yt  is  among  the  extremal 
values  of  Yt,  [23]  and[41].  For  example,  mini^TYt  is  the  boundary  estimate  in  the  no- 
regressor  case.  Therefore,  having  a  few  outlier  observations  Y°  (such  that  Y°  <  X[(i,) 
severely  biases  and  renders  inconsistent  the  estimates  of  both  P  and  7.  The  outliers  arise 
as  misrecordings  of  the  bid  with  a  low  probability  (not  the  usual  additive  measurement 
error)  or  bid  mistakes.  Bajari[6]  offers  a  substantive  analysis,  suggesting  outliers  are 
responsible  for  drastic  overestimates  of  the  mark-ups  in  prominent  auction  studies. 

Suppose  the  number  of  outliers  Y°  is  bounded  by  a  constant  K ,  independent  of  T. 
Consider  the  r-th  near-extreme  regression  quantile  estimator  x'/3(t)  of  the  boundary 
x' P,  with  quantile  index  r  =  M/T,  M  ~  InT.  Asymptotically  x  i->  x'/?(r)  passes  above 
the  outliers,  and  is  T/ In  T-rate-consistent.  Substitute  ^(t)  into  L{P,^)  and  estimate 
7  via  ML.  The  resulting  estimator  of  7  is  efficient.  Chernozhukov  and  Hong[14]  offer 
an  analysis.  Although  we  focus  on  the  linear  boundaries,  a  non-linear  extension  in  this 
model  is  straightforward.  Regardlessly,  the  linear  forms  include  the  polynomial  and 
piece-wise  linear  specifications,  approximating  the  smooth  parametric  functions  as  well 
as  we  like. 

3     Extremal  Quantiles  and  Rank  Sequences 

This  section  defines  the  linear  regression  model,  the  sample  regression  quantiles,  the 
extreme  and  intermediate  rank  concepts  for  these  statistics,  and  the  tail  types. 

3.1      Extremal  Conditional  Quantiles 

Suppose  Yt  is  the  response  variable  in  K,  and  Xt  are  the  conditioning  variables  in 
K^ .  The  r-th  conditional  quantile  function  Qy{t\x)  is  a  function  q{x)  that  satisfies 
the  relationship  P{Y  <  q{X)\X)  =  r.  For  instance,  (5y(.25|a:)  and  Qy(.l|x)  are  the 
conditional  first  quartile  and  decile  functions.  Formally, 

Qy{T\x)=Fy\T\x), 

where  Fy'(|x)  is  the  inverse  of  Fy-(|i).  Our  focus  is  exclusively  on  modeling  and 
making  inference  on  the  extremal  conditional  quantile  functions: 

Qy(t\x),  where  r  is  near  0. 

The  formal  concept  of  extremality  or  nearness  will  be  developed  later. 


3.2  Linear  Quantile  Regression  Model 

In  this  paper  we  consider  the  hnear  model  for  quantiles  of  interest  I 

Qy{T\x)=F-'{T\x)=x'^[T),     Vt€I,  (3.2) 

where  /?(■)  is  an  unknown  function  of  r.  Here  it  is  necessary  that  (3.2)  holds  for 

1=  [0,77],  where??  >  0.  (3.3) 

If  T]  is  small,  the  linearity  is  assumed  only  for  low  quantiles,  and  not  necessarily  for 
other  quantiles.  We  cissume  that  X  has  (or  is  trimmed  to)  a  compact  support  X. 

The  model  (3.2)  is  implied,  for  instance,  by  the  classical  linear  location-scale  models 
with  unknown  error  distribution,  but  is  considerably  more  flexible  in  the  sense  that  the 
shape  of  the  conditional  density  may  change  with  the  covariates.  X  may  incorporate  a 
wide  array  of  polynomial  and  other  transformations  of  the  observed  covariates.  On  its 
basis,  in  section  4,  we  develop  the  models  with  the  extreme- value-theoretic  restrictions 
on  the  conditional  tails. 

Note  the  approaches  to  linear  modeling.  One  approach  assumes  linearity  of  a  single 
or  few  quantiles  (Buchinsky[9],  Horowitz[43],  Powell[62]).  Another  approach  (Koenker 
and  Machado[51])  assumes  the  linearity  of  all  quantile  functions,  I  =  [0, 1].  The  "local 
in  r  linearity"  assumption  made  here  is  closer  to  the  first  approach. 

Despite  convenience,  having  linearity  for  several  r  may  pose  an  avoidable  caveat 
(the  curves  may  cross).  First,  X  is  often  a  transformation  of  the  original  covariates, 
so  the  curves  are  non-linear  in  the  original  space  (see  [49]).  Second,  given  compactness 
of  support  X,  the  linear  model  is  always  coherent.  Take  a  countable,  possibly  finite 
collection  of  non-crossing  curves  {x  h^  x'P{Ti),i  G  J}  with  domain  X.  Define  x  >-->■ 
x'/3(t)  for  other  r  by  taking  appropriate  convex  or  linear  combinations  of  these  lines. 
By  construction,  the  fines  cross  only  outside  X.  This  also  defines  the  conditional  c.d.f. 

3.3  Sample  Regression  Quantile  Statistics 

Suppose  we  have  T  observations  {Yt,Xt}.  In  the  no-covariates  case  the  sample  r-th 
quantile  P{t),  is  generated  by  solving  the  problem 

t=l 

where  Pt{x)  =  {t  —  1{x  <  0))  x.  Koenker  and  Bassett[48]  extended  the  concept  to  the 
regression  setting  by  solving 

T 
min      TpriVt-X^li).  (3.4) 

The  /?(r)  that  solves  (3.4)  has  the  equivariance  and  robustness  properties  of  the  ordi- 
nary sample  quantiles;  in  particular,  (i)  regression  equivariance,  (ii)  scale  equivariance, 
(iii)  equivariance  to  (full  rank)  linear  transformations  of  X,  (iv)  invariance  to  pertur- 
bations of  y^t  without  crossing  the  hyperplane  x'${t).  The  solutions  x'${t)  to  (3.4)  (if 
unique)  pass  through  d  points  {Yi,  Xt)  and  the  function  r  1-^  X'/3{t)  is  monotone  in  r. 


3.4     Extremes,  Near-Extremes  &  Data  Scarcity 

We  view  the  sample  regression  quantiles  as  order  statistics  in  regression  settings.  For 
a  given  sample  of  size  T,  the  r-th  sample  regression  quantile  is  seen  here  as  the  rT-th 
order  statistic.  Henceforth,  we  shall  refer  to  tT  as  to  the  rank  or  order. 

Definition  3.1  (Rank  Conditions)  The  sequence  of  quantile  index-sample  size  pairs 
{tt,T)  is  said  to  be: 

(i)   an  extreme  rank  sequence,  if  t-j-  \  0,  TtT  — >  A;  >  0, 
(ii)  an  intermediate  rank  sequence,  if  Tt  \  0,  TrT  — >■  oo, 
(ii)  a  central  rank  sequence,  if  r  is  fixed,  and  T  — ^  oo. 

Even  though  (i)  and  (ii)  make  r  sample  size  dependent,  to  simplify  we  write  r  instead 
of  Tt-  Because  principles  (i)  and  (ii)  constructively  exploit  that  the  relevant  data  is 
formed  by  the  tail  events  and,  or  is  scarce,  they  lead  to 

a.  asymptotic  distributions  that  either  fit  the  finite-sample  distribution  better  or, 
giA'en  the  same  approximation  quality,  are  more  parsimonious  relative  to  the 
conventional  central  rank  approximations  (Koenker  and  Bassett[48],  Powell[62]) 

b.  important  tail  inference  procedures,  based  on  the  sample  regression  quantiles. 
See  example  2.3  and  section  7. 

Concepts  (i)  and  (ii)  are  well  motivated  by  the  intellectual  and  practical  success  of 
extreme  value  theory,  which  focused  on  the  ordinary  sample  quantiles.  These  con- 
cepts are  also  similar  in  spirit  to  other  types  of  "alternative"  asymptotics,  e.g.  GMM 
when  the  number  of  moment  conditions  is  large,  or,  generally,  the  theory  of  statistical 
experiments. 

In  evaluating  these  concepts,  it  is  important  to  keep  in  mind  that  these  alternative 
sequences  are  designed  to  yield  practically  better  approximations  even  when  the  quan- 
tile index  is  not  very  low.  And,  in  case  (a),  it  is  completely  irrelevant  whether  or  not 
future  sampling  will  lead  to  samples  conforming  to  these  sequences. 

To  clarify  (a),  consider  a  simple  example  with  no  X.  Suppose,  with  an  i.i.d. 
sample  {Ut,t  <  T  =  200},  we  wish  to  infer  about  the  quantiles  with  indices  r  = 
.025,  r  =  .1,  r  =  .2,  r  =  .3.  The  estimators  are  the  order  statistics  (sample 
quantiles)  [^(5),  t/(20);  t^(40)7  f^(60)-  Suppose  the  distribution  F^.  has  an  algebraic  tail 
F^{x)  ~  (— a;)~^/^,^  =  1  ais  X  \  —00.  Figure  5  compares  the  conventional  central 
rank  approximation  VT {U^rT)  -  F~^{t))  -^  N  (0,r(l  -  t) / f^ {Fu (t)))  ,  where  fu  =  F^, 
with  the  intermediate  rank  one:  ariU^^r)  —  F~^{t))  — >  N{0,^^/{m~^  -  1)^),(4  = 
—  l,m  >  1),  ot  =  VrT/F'^irriT)  —  F~^(t)),  and  the  extreme  rank  approximation: 
T^^'^{U(rT)  —  F~^{t))  — >  -fc~'/^  —  r^'^^,  where  Tk  is  a  gamma  random  variable  with 
degree  k  (sum  of  k  standard  exponentials,  section  3.6). 

Quality-wise,  the  extreme  rank  approximation,  which  exploits  both  the  extremality 
and  scarcity  of  tail  events,  beats  the  normal  quite  considerably  (displays  A-C).  Only  for 
a  fairly  non-extreme  quantile,  r  =  .3,  does  the  normal  approximation  achieve  roughly 
the  same  quality.  At  the  same  time,  the  intermediate  rank  approximation,  which  ex- 
ploits the  extremality  of  relevant  events,  is  very  close  to  the  central  rank  approximation 
(displays  D-F),  but  enjoys  greater  parsimony  and  ease  of  inference.  The  tail  index  ^  is 
ea^y  to  estimate,  and  the  scaling  Oj-  is  estimated  by  the  sample  interquantile  spacing 


(see  section  3.6).  This  may  be  preferred  to  the  nonparametric  estimation  of  the  density 
function  evaluated  at  the  low  quantile  (with  the  scarce  tail  data),  as  required  in  the 
central  rank  theory.  See  [13]  for  a  Monte-Carlo  regression  example. 

3.5  Tail  Types,  Support  Types,  and  Classical  Limits 

The  following  definitions  are  important  in  the  sequel. 

Definition  3.2  (Types  of  Support)  In  view  of  linearity,  we  say  Fy{-\X)  has: 

•  finite  support,  if  Qy{0\X)  >  — oo,  a.s. 

•  infinite  support,  if  Qy{0\X)  =  — oo,  a.s. 

Definition  3.3  (Tail  Types,  Tail  Index,  Regular  variation)  Consider  a  random 
variable  U  with  distribution  function  F^,  with  lower  end-point  Xf  equal  0  or  — oo.  F^ 
has  the  tail  of  the  extremal  types  1,  2,  or  3  if  for  [/  ~  ^  if  f/g  — >  !■] 

typel:(^  =  0)     :   as  «  \  0  or   -  oo,    F„(f -|- x£(<))  ~  Fu(i)e"^,      Vx    £  K, 

type  2:  (^  =  -)     :a.st\  -oo,  x'^Fuit)  ~  Fu{tx),  Vx  >  0,  a  >  0,      (35) 

type  3;  (^  =  — )  :   as  i  \  0,  x"  Fu{t)  ~  F^itx),       Vx  >  0,  a  >  0. 

a 

where  i{t)  =  J  Fu{v)dv/Fu{t),  for  t  >  x/,  cf.  [52].  Enclosed  in  the  brackets  in  (3.5) 
is  the  tail  index  (,,  which  determines  the  tail  type. 

Equation  (3.5)  defines  type  2  distributions  as  regularly  varying  functions  at  -co 
with  index  —1/^  =  —a,  (algebraically  and  near- algebraically  tailed  at  —00,  in  more 
intuitive  terms).  (3.5)  also  defines  type  3  distributions  as  regularly  varying  functions 
at  0  with  index  —  1/^  =  a  >  0  (algebraically  and  near- algebraically  tailed  at  a  finite 
end  point,  taken  as  0).  The  type  1  class  includes  exponentially  and  near-exponentially 
tailed  distributions.  For  future  reference,  note  these  conditions  imply  that  the  quantile 
function  F^^{t)  is  regularly  varying  at  0  with  index  —(,.  E.g.  [26]. 

Classes  1-3  contain  most  of  smooth  distributions  with  rare  exceptions.  See  [26]. 
The  tail  types  determine  the  limiting  distributions  of  order  statistics  under  extreme 
and  intermediate  ranks.  In  our  setting,  they  will  have  a  similar  role  as  well.  For  later 
comparisons,  let  us  review  the  non- regression  results. 

3.6  Limit  Distributions  of  Ordinary  Sample  Quantiles 

Extreme  Rank  Statistics.  Consider  the  order  statistics  f/(i)  <  ...  <  [/(jt)  from  the 
i.i.d.  sample  Ui,...,Ut,  distributed  according  to  law  Fu,  with  the  lower  end-point  x/ 
equal  0  or  —00.  The  extreme  value  theory  described  the  existence  and  forms  of  the 
non-degenerate  limit  laws  for  the  properly  normalized  order  statistics: 


A.  tau=.025,  T=200,  rank=5       B.  tau=.2, 1=200,  rank=40        C.  tau=.3,  T=200,  rank=60 
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Figure  2:     Displays  A-C:  QQ-plot  of  Extreme  and  Central  Rank  Approximations. 

The  dashed  line  "-  -  -"  is  the  central  approximation,  and  the  dotted  line  " "  is  the  extreme 

rank  approximation.  The  true  quantiles  of  the  exact  sampling  distribution  are  depicted  by  the 

solid  line  " " .  The  central  rank  approximation  varies  from  very  bad  to  bad  for  low  quantiles 

T  =  .025  and  r  =  .2  and  becomes  comparable  to  the  extreme  rank  approximation  only  at 
T  =  .3.  Displays  D-F:  QQ-plot  of  Intermediate  and  Central  Rank  Approximations. 

The  dotted  line  " "   now  denotes  the  intermediate  rank  approximation.    The  theoretical 

central  and  intermediate  rank  approximations  have  approximately  the  same  performance  for 
T  =  .1,.2,.3,  (using  m  —  2,1.5,1.25).  The  practical  advantage  of  the  intermediate  rank 
approximation  is  the  parsimony  and  eaise  of  estimating  nuisance  parameters.  [Replications 
=  10,000.  QQ  plots  are  over  the  99%  range.] 
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For  fixed  k,  the  limit  laws  of  type  1-  3,  were  identiiied  in  the  literature  as: 

InTk,       for  type  1  tails, 

Jfc  =  <        — r,.  "  ,      for  type  2  tails, 

for  type  3  tails, 


(3.6) 


k  ' 


with  the  canonical  scalings  given  by: 


type  1: 

ar 

type  2: 

aj- 

type 3: 

ar 

lli[F-'{^)l     bj 

-yF-'ir) ' 


6^  =  0, 
b-r  =  0. 


(3.7) 


Note  that  when  k  =  I,  the  type  1  law  in  (3.6)  is  called  Gumbell,  type  2-  Frechet,  type  3 
-  Weibull.  Typically,  the  results  state  the  distribution  functions  of  Jk,  but  more  recent 
treatments  formulate  the  results  in  the  above  form  (e.g.  Example  4.2.5  in  [26]),  which 
helps  explain  our  results. 

Intermediate  Rank  Statistics.  One  of  most  general  and  fairly  recent  treatments  of 
the  intermediate  order  statistics  is  the  work  of  Dekkers  and  de  Haan[21].  Using  slightly 
stronger  restrictions  on  the  tails,  discussed  below,  they  found  that  the  limit  laws  are 
normal,  but  the  limiting  variance  depends  on  the  extremal  tail  types  through  the  tail 
index  ^,  as  A;  =  [tT]  -^  oo  (Theorem  3.1): 


/tT 


Fu'{2t)-F-\t) 


[/, 


(IrT}} 


F, 


\:\r)) 


N 


0, 


e 


(2-«-l)2 


(3.8) 


The  scaling  a-p  can  conveniently  be  replaced  by  VtT/{U^2[tT])  ~  U^^rT])  without  affect- 
ing the  result,  and  operationalizing  the  inference. 

4     The  Extremal  Regression  Quantile  Models 

Here  we  construct  the  linear  models  of  low  (extremal)  conditional  quantiles,  which  allow 
flexible  covariate  effects  on  the  distribution,  and  coherently  combine  the  tail  conditions 
leading  to  (i)  non-degenerate  asymptotic  distributions  and  congenial  inference  proce- 
dures, (ii)  good  approximations  to  the  sampling  distributions,  and  (iii)  a  framework 
suitable  for  inference  about  tails  in  shape-heteroscedastic  models. 

4.1      Model  1:  Tail  Homogeneity 

Consider  a  probability  space  (fJ,  T,  P),  possibly  indexed  by  T.  To  impose  a  constructive 
tail  condition,  define  a  reference  error  term  as 


U  =  Y  -X'Pr, 


(4.9) 


where  x  i->  x' l3r  is  the  reference  line  chosen  so  that  the  error  U  satisfies  the  tail  homo- 
geneity condition  (i)  in  assumption  1.  The  existence  of  such  a  line  is  an  assumption; 
the  examples  below  highlight  its  constructive  role. 
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In  the  bounded  support  case,  it  is  convenient  to  choose  the  reference  Hne  as 

Pr  =  ,5(0),  (4.10) 

so  that  [/  =  y  —  X'P{0)  >  0  has  the  end-point  0  by  construction.  (In  the  unbounded 
support  case,  x'P{0)  =  — oo  and  is  not  suitable  as  a  reference  line). 

Assumption  1  (Model  1:  Tail  Homogeneity)  In  addition  to  linearity  (3.2): 
(i)  there  is  a  real-valued  U  and  reference  line  x'Pr  of  the  form  (4.9)-(4.10)  s.t. 

as  z  \  — oo  (infinite  support)  or  as  z  \  0  (finite  support),  uniformly  in  x  €  X. 
Fu  is  a  distribution  function  with  type  1,2,  or  3  tails. 

(ii)  the  support  of  X  is  (or  trimmed  to)  a  compact  subset  X  of  IR'*. 

(Hi)  the  distribution  function  ofXt,  Fx ,  with  support  X  and  mean  px,  is  nondegen- 
erate  in  W^.  The  first  component  of  X  is  1.  px  =  (1,0, ...)  w.l.o.g. 

Assumption  l-(ii),  compactness,  is  essential;  otherwise,  the  limits  may  change  de- 
pending on  the  tail  behavior  of  X .  l-(iii)  precludes  non-degeneracies. 

Assumption  l-(i)  requires  the  tails  of  the  suitably  defined  error  term  U  to  be  in  the 
domain  of  the  minimum  attraction  which  is  fairly  broad  with  rare  exceptions  (section 
3.5.)  In  this  sense  the  model  is  distribution-free.  l-(i)  also  requires  the  tail  of  the 
conditional  distribution  function  of  U  to  be  approximately  independent  of  X.  This 
incorporates  the  case  of  independent  U  and  X  as  strictly  a  special  case.  Indeed,  l-(i) 
requires  only  that  there  is  a  reference  error  U  in  (4.9)  such  that  the  extremal  (small) 
values  of  [/  are  approximately  independent  of  A'.  This  allows  general  global  dependence 
of  U  on  X ,  such  as  shape  heteroscedasticity. 

Example  4.1  (Classical  linear  model)  Suppose  the  quantile  function  is 

QAr\X)=X'a-i-F~\T),  (4.11) 

which  corresponds  to  the  model  Y  =  X'a  -\-  U,  where  U  is  independent  of  X  and  e.g. 
EU  =  0.  This  clearly  is  a  special  case  of  Model  1  with  the  reference  line  x'a.  Yet  this 
example  is  narrow  and  "trivial"  in  the  sense  that  the  extremal  features  are  determined 
by  the  central  features  of  the  distribution,  and  there  is  "nothing  to  estimate"  (all  slope 
coefficients  ^_i(r)  equal  a_i.)  To  defend  the  "trivial"  model,  note  it  underlies  much 
of  the  (central)  quantile  regression  inference,  Koenker  and  Bassett[48],  because  it  often 
plausibly  approximates  the  exact  distribution  of  regression  quantiles,  even  though  the 
model  itself  is  unrealistic  (Koenker  and  Hallock[50]). 

Example  4.2  Consider  the  bounded  support  case.  The  0-th  quantile  function  is 
Qy{0\X)  =  X'j3{0),  which  is  our  reference  line  X'(3r.  By  assumption  l(i), 

P(Y  -  X'/?(0)  <  l\X)  ~  Fu{l)  =  Ti,  asl\0, 

which  implies  that  the  paths  of  the  extremal  quantile  functions  x  h^  x' P{ri)  are  ap- 
proximately parallel  to  that  of  a;  i->  x'/3(0).    This  model  is  not  "trivial"  in  the  sense 
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of  Example  4.1,  because  the  extremal  quantiles  and  the  reference  line  are  determined 
only  by  the  extremal  features  of  the  conditional  distribution.  The  model  does  not  re- 
strict other  quantiles,  allowing  for  general  shape-  heteroscedasticity.  For  example,  a 
collection  of  quantile  curves  {x  !->■  Qyiqlx),  <?  £  C}  with  the  central  indices  C  may  have 
complicated  non-parallel  paths,  allowing  for  complicated  effects  of  covariates  on  the 
conditional  density  shape  (kurtosis,  skewness,  and  other  effects),  as  in  Figure  3.  Con- 
sequently, this  model  does  not  admit  reductions  to  non-regression  models  by  removing 
a  conditional  location  and  scale  from  Y. 


Quantiles 

..••6C(3/^|x) 

"~    -- 

l^- 

- 

^— -^ 

'^ 







0,(  1  1  X  )=  X  b(t).      Low  Ouanttle 

Reference  Line 

Figure  3:  Example  4.2:  Extremal  conditional  quantile  function  x  i->  x  P{t) 
is  approximately  parallel  to  the  reference  line  x  >->  x  pr  (equal  to  the  mini- 
m£il  quantile  x'P{0)  in  the  bounded  support  case).  Other  quantile  functions 
are  unrestricted,  allowing  for  complicated  forms  of  global  heteroscedastic- 
ity. The  model  does  not  admit  the  reduction  to  a  non-regression  model  by 
removing  the  conditional  median  (or  mean)  and/or  scale  from  Y  variable. 

Example  4.3  Consider  the  unbounded  support.  For  some  reference  line  x  i->  x'pr,  by 
assumption  l(i), 

P{Y  -  X'pr  <  l\X)  ~  Fu{l)  =  Ti,  a.sl\  -DO, 

which  implies  that  the  paths  of  the  extremal  quantile  functions  x  M-  x'I3{ti)  are  ap- 
proximately parallel  to  that  oi  x  i-^  x'Pr-  This  model  is  also  not  "trivial"  in  the  sense 
of  example  4.1,  because  the  extremal  quantiles  are  determined  only  by  the  extremal 
features  of  the  conditional  distribution.  As  in  example  4.2,  the  model  does  not  restrict 
any  other  features  of  the  distribution,  allowing  for  general  forms  of  global  heteroscedcis- 
ticity.  Thus  it  is  irreducible  to  a  non-regression  model. 

Note  that  examples  4.2  and  4.3  demonstrate  that  the  linear  location-scale  models 
are  neither  implied  by  Model  1  nor  imply  Model  1.  Thus  Model  1  is  of  its  own  nature, 
crafted  to  yield  non-degenerate,  parsimonious  limits.  Unlike  the  location-scale  models, 
Model  1  admits  general  global  heteroscedasticity,  allowing  covariates  to  affect  the  shape 
of  the  conditional  distribution. 

4.2     Model  2:  Congenial  Tail  Heterogeneity 

We  suggest  a  model  that,  while  flexibly  accounting  for  the  dependence  of  the  tail  on 
covariates,  exhibits  simplicity,  enabling  an  explicit,  practical  limit  theory  for  both  the 
extreme  and  intermediate  rank  sample  regression  quantiles. 
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Assumption  2  (Model  2:  Congenial  Tail  Hetrogeneity)  Suppose  assumption  1 
holds,  except  l-(i)  is  replaced  by  the  following  tail  condition: 

Fy{z\x)  ~  K{x)  ■  Fu{z),  asz\-ooorz\0,  (4.12) 

uniformly  in  x  £  X,  Fy,  has  type  1-3  tails,  K{-)  is  assumed  to  be  a  positive  continuous 
function  on  X,  bounded  above  and  away  from  zero,  normalized  so  that  K{fix)  =  1  (or 
at  any  other  reference  point  x^  6  X). 

Just  like  Model  1,  Model  2  is  distribution-free,  since  F^  is  not  assumed  to  be  para- 
metric, and  it  allows  the  general  (shape)  forms  of  global  heteroscedasticity.  Unlike 
Model  1,  Model  2  allows  for  richer  effects  of  covariates  on  tails. 

The  imposed  tail  condition  may  seem  an  unconventional  way  to  introduce  het- 
eroscedasticity. Yet,  in  many  regards,  it  is  more  flexible  and  constructive  than  the  con- 
ventional location-scale  modeling,  as  explained  below.  The  proposed  modeling  strategy 
is  motivated  by  the  closure  of  the  domains  of  minimum  attraction  under  tail  equiva- 
lence, and  is  fully  consistent  with  linearity. 

Indeed,  Lemma  10,  characterizes  this  model  in  detail:  (i)  implications  for  the  quan- 
tile  coefficients  of  the  linear  model,  (ii)  limits  of  ratios  of  spacings  between  the  con- 
ditional quantile  functions,  and  (iii)  many  other  properties  needed  for  inference.  Im- 
portantly, we  deduced  that  the  linearity  assumption  and  (4.12)  jointly  imply  that  A'(-) 
can  be  represented  as 

{e~^  "^         for  type  1  tails, 
{x'c)°'        for  type  2  tails,  (4.13) 

{x'c)~°'      for  type  3  tails, 

where  fi'^c  =  1  for  type  2  and  3  tails,  and  /z'^c  =  0  for  type  1  tails.  In  Model  1, 
c  =  0  for  type  1  tails,  and  c  =  (1,0,...)'  =  e'j  for  type  2  and  3  tails.  We  call  c  the 
tail  heterogeneity  index.  It  measures  the  strength  with  which  X  shift  the  tails  of  error 
terms  U.  Note  that  x'c  >  0  uniformly  on  X  for  types  2  and  3  by  assumption. 

It  is  plausible  that  (potentially)  the  non-parametric  function  K{-)  in  (4.12)  is  in 
fact  a  transformation  of  the  linear  index  x'c  determined  by  the  tail  index  ^.  Recall 
(,  =  0  (for  type  1  tails)  and  £,  =  1/a  and  —  1/q  for  type  2  and  3  tails,  respectively.  This 
assumption  leads  to  parsimonious,  convenient  limits  for  regression  quantiles. 

The  following  examples  illustrate  the  model's  flexibility. 
Example  4.4  (Linear  Location-Scale  Model)  Assume  for  X'j  >  0  a.s. 

Qy(r|x)  =  x'Q  +  x'7-F-'(r),  (4.14) 

corresponding  to  the  location  model  Y  =  X'a  -\-  X'-y  ■  V,  where  V  is  independent  of 
A'^  and,  say,  has  mean  0  and  variance  1.  Assume  F^  has  the  extremal  tail  type  with 
^  ^  0.  Then  for  the  reference  line  x'a  and  U  =  Y  —  X'a  =  X'-y  -  V 

P{X'-f  ■  V  <  l\X)  ~  (X'7)-^/^  -  F^il),  as  /  \  -co, 

so  the  conditions  of  Model  1  are  satisfied  with  F^  =  Fy.  The  location-scale  model 
imposes  two  stringent  restrictions:    (i)  the  extremal  features  of  the  distribution  are 

14 


largely  determined  by  the  (central)  location  and  scale  parameters:  /3(t)  =  a+7-F~^  (r), 
and  (ii)  the  covariates  are  limited  to  affect  only  the  location  and  scale  of  the  conditional 
distribution,  precluding  the  shape  effects  like  skewness  or  kurtosis. 

Example  4.5  Model  2  requires  that  for  some  reference  line  x'/J^ 

P{Y  -  X'pr  <  l\X)  ~  K{x)  ■  Fu{l),  as  /  \  0  or   -  cx), 

which  implies  that  the  paths  of  the  extremal  quantile  functions  x  M-  x'P{ti)  are  no 
longer  parallel  to  that  of  x  h-)-  x'Pr-  (The  crossing  of  lines  is  precluded  because  the 
assumption  is  consistent  with  linearity,  Lemma  10).  This  model  is  not  as  restrictive 
as  example  4.4.  First,  the  extremal  quantiles  (and  the  reference  line)  are  determined 
only  by  the  extremal  features  of  the  conditional  distribution.  Second,  the  model  allows 
for  general  global  heteroscedasticity  -  the  entire  shape  of  the  conditional  density  may 
change  with  covariates  (scale,  skewness,  etc),  including  the  tails. 


Figure  4:  Example  4.5.  Extremal  quantile  functions  x  i->  x'/3{t)  are  no 
longer  approximately  parEillel  to  the  reference  line  x  >->  x'fir,  over  X,  allow- 
ing the  tail  heteroscedasticity.  Other  quantile  functions  are  unrestricted, 
allowing  for  complicated  forms  of  global  heteroscedasticity  as  well.  The 
extremal  features  of  the  model,  including  the  reference  lines,  are  not  deter- 
mined by  the  central  features. 

This  discussion  concludes  the  construction  of  our  models.  In  principle,  it  should  be 
possible  to  further  relax  the  modeling  assumptions,  particularly  in  the  nonparametric 
direction,  but  a  good  deal  of  caution  is  needed  to  assure  the  joint  coherency  of  the 
tail  conditions,  functional  dependence  of  the  quantile  curves  on  regressors,  and  non- 
degeneracy  of  the  limit  distributions.  Note  that  the  obtained  models  are  coherent, 
flexible,  distribution-free,  lead  to  non-degenerate,  parsimonious  limit  distributions,  and 
provide  a  convenient  framework  for  inference  about  the  tails. 

5     Asymptotics  under  Extreme  Ranks 

Recall  that  the  approximation  concept  of  extreme  ranks  requires  tT  — >■  fc  >  0.  Here 
we  state  the  distribution  results  for  Models  1  and  2  and  explain  their  barest  essence, 
while  leaving  proofs  and  some  generalizations  to  the  appendix. 

5.1.  A  Sketch.  First,  obtain  a  finite-dimensional  {fidi)  weak  limit  Qco()  of  the 
finite-sample,  suitably  scaled,  objective  functions  {Qt{-)}-   Qoo  is  defined  by  a  point 
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process  that  "counts"  the  "extremal  events."  Then,  the  normalized  regression  quantile 
statistic,  Zx,  an  argmin  of  Qt,  will  converge  in  distribution  to  a  random  variable  Zoo, 
the  argmin  of  Qoo,  by  convexity  of  {Qt]  and  Qoo- 

For  brevity,  we  confine  our  discussion  to  type  3  tails.  Consider  the  statistic 

Zr  =  ar{P{T)-l3{0)), 

where  Ct-  is  the  canonical  scaling  in  section  3.6,  defined  in  terms  of  the  function  F,,  in 
assumption  1  or  2.  Zt  optimizes  the  rescaled  by  a^  objective  function  in  (3.4): 

T 
Qt{z)  =  J2  ('rPr  [Ut  -  X^z/a-r)  -  a^rUt),  (5.15) 

where  Ut  =  Yt  -  A'j'/?(0)  >  Q  a.T\d  z  =  ar{l3  -  I3r)-  We  subtracted  the  "smoother" 
X^j  rUtCiT,  which  brings  a  key  continuity  property  and  stabilizes  Qt-  Clearly  this  does 
not  affect  the  argmin  Zr-  [The  "smoother"  for  type  1-2  tails  is  more  involved;  Lemma 
1.  Incidentally,  the  conventional  central  rank  stabilization  by  ^jPr(t/tar)  is  bad,  for 
it  sends  the  objective  to  +00,  let  alone  continuity.]  Hence 

T 

Qt[z)  =  -TtX'z  -  J2l{Utar  <  X[z)  {Utar  -  X'^z).  (5.16) 

t=i 

This  function  is  convex.  Notably,  it  is  constructed  as  a  continuous  functional  of 
the  point  process  defined  next.  The  fi-di  distribution  of  Qt  is  defined  by  that  of 
{Qt{zj),J  <  l]  for  any  finite  (z_/,  j  <  /).  Since  A'  — >•  /xx,  and  tT  — >  k  (for  j  <  I)  : 

T 
Qt{zj)  =  -kn'xZj  -  ^  l{UtaT  <  XtZj)[Utar  -  X[zj)  +  Op(l).  (5-17) 

t=i 

The  limit  behavior  of  Qt  is  determined  by  the  point  process  N  that  assigns  mass  to 
measurable  sets  A  by: 

r 
N(^)  =  Y^  \{{arUt,Xt}  e  A),  for  AcE=  [0,oo)  x  X. 
(=1 

The  point  process  N  is  a  measure  defined  by  its  random  points  (atoms)  {aT.Ut,Xt,t  < 
T)  (See  Definition  A.l,  B.l).  We  find  that  Qt{-)  is  an  integral  of  a  residual  function 
with  respect  to  the  point  process,  which  seems  to  be  special  to  this  problem. 

Point  process  theory  is  the  bread-and-butter  of  extreme  value  theory,^  [26],  and  is 
useful  here.  Indeed,  a  Lebesgue-Stieltjes  integral  f  gdN  of  N  with  points  {Xj}  is  : 


j  9{x)dN{x)  =  Y^g[X,). 


®Point  process  theory  was  developed  by  Kallenberg[44],  Resnick(65]  and  others  in  considerable  gen- 
erality. Applications  are  numerous  in  statistics.  For  example,  Feigin  and  Resnick[29]  approximate  the 
constraints  of  the  linear  programming  estimators;  also  Knight[46]  ;  Emrechts  et  al.[26]  and  Resnick[65] 
show  how  point  processes  may  be  used  in  related  applications,  particularly  the  exceedance  processes, 
extremal  processes,  and  record  values. 
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Convergence  of  such  integrals,  for  continuous  maps  x  M-  g{x}  that  vanish  outside 
compact  sets,  metrizes  weak  convergence  of  point  processes  (Definition  A. 2.)  So  we 
represent  {Qt{zj),J  <  I)  as  an  integral 

Qt{zj)  =  -kn'xZj  +  [{I  -  x'zj^dNil,!)  +  Op(l),  (5.18) 

where  (/  —  x'z)""  is  the  "residual"  function.  Lemma  2  proves  {Qt{z),J  <  l)  \s  a. 
continuous  map  of  the  point  process  N,  so  its  weak  limit  is  determined  by  that  of 
N.  Notably,  to  obtain  continuity  for  various  tail  types,  the  construction  of  point 
process  N  requires  a  careful  choice  of  the  underlying  topological  space  E  (and  additional 
transformations  of  Qt  for  types  1  and  2). 

The  weak  hmit  (Def.  A. 2)  of  N  in  Model  2  is  a  Poisson  process  N,  Lemma  6: 

oo 
i=l 

where  {Ji,Xi]  are  random  points  defined  as 

(j„  Xi,    i>l)  ^(A^'cLf ,  Xi,    z>l), 
Vi=£i  +...  +  £i,    i  >1, 


(5.19) 


where  {£i}  are  i.i.d.  exponential  random  variables  with  mean  1,  {Xi}  are  i.i.d.  with 
law  Fx ,  distributed  independently  of  {£i},  and  c  is  the  tail  heterogeneity  parameter. 
In  Model  1,  because  c  =  (1,0, ...),  a  natural  simplification  occurs: 

A;'c=l,Vi.  (5.20) 

The  first  result,  explained  in  Lemma  6,  is  not  self-evident,  while  (5.20)  is  fairly  intuitive. 
Note  that  N(.4)  =  J2i<T  ^iWTU(i),X^iy}  €  A),  where  [/(,)  is  i-th  rank  error,  and  Xu\  is 
the  corresponding  covariate.  Vector  [arU^i),!  <  q)  — >  {t\''^ ,i  <  q)  (Section  3.6),  and 
is  asymptotically  independent  oi  Xi  by  Assumption  l-(i)  in  Model  1,  which  explains 
the  form  of  (5.19)  and  (5.20)  for  Model  1.  (This  is  not  a  proof).  Lemmas  5-6  provide 
the  proof  for  Model  2  (and  1  by  implication)  using  the  Kalenberg's  theorem,  Meyer's 
conditions,  and  a  series  of  compositions  and  transformations  of  a  canonical  Poisson 
process  (Def.  A. 4  provides  a  background). 

We  conclude  that  the  fidi  weak  limit  of  {Qt}  is 

/CX) 
(jf  -  x'z)~dN{j,x)  =  -kfjLxZ  +  ^(Ji  -  X[z)~ . 

Therefore,  we  obtain  by  convexity  Lemma  1  in  the  appendix  (Theorem  5  in  Knight[46]) 
the  limit  distribution  for  Zt,  provided  (5oo()  has  a  unique  argmin  a.s.  and  is  finite  on 
an  open  non-empty  set  (verified  in  Lemma  2  and  11).  Hence 

Or  (/?(r)  - /3(0))    -^   Zoo  =  argmin  Qoo(2)-  (5.21) 
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Finally,  Lemma  10  shows  that  ax(/3(0)  -/?(r))  ->  fc»c  in  Model  2  (  c  =  ej  =  (1,0,...)' 
in  Model  1)  so  that 


a^(4(r)-/3(r)) 


-k°  C  +  Zn 


5.2.  Results  for  Models  1  and  2.  The  above  discussion  hopefully  provided  an 
intuitive  explanation  of  the  foregoing  formal  results  for  Models  1  and  2  (Theorems  1 
and  2).  The  proofs  are  in  the  appendix.  To  state  the  result,  suppose  we  have  /  sequences 
{Ti,i  <  1}  such  that  t,T  — >  ki,  so  we  index  the  normalized  regression  quantile  statistic 
as  Zriki),  for  both  T  <  oo  and  T  =  oo.  Define 

Zrik)  =     ar  (/3(t)  —  Pt  —  br^i )  ,  for  type  1  tails, 

Zrik)  =  ar  (/3(r)  -  Pr)  ,  for  type  2  &  3  tails. 

Also  define  the  centered  statistic 

Z^{k)=ar[p{r)-P{T)). 

The  canonical  constants  {a-j-jbr)  are  defined  in  (3.7)  in  terms  of  functions  F^,  which 
are  defined  in  Assumptions  1  and  2  along  with  the  error  term  Ut  and  Pr- 

The  key  point  process,  N()  =  5I(<t  ^{{ariUt  -  6t),^Y(}  €  ■)  weakly  converges  (Def. 
A. 2)  to  N( )  =  ^i>i  l{{Ji,Xt}  €  •)  by  Lemma  4-6,  with  points  {Ji,Xi}  defined  as: 


(j„  Xi,  i  >  l)  =  < 


(ln(rO  +  A'/c,      Xi)  for  type  1, 

(r7^/"A7c,      Xi)  for  type  2,         i>l  (5.22) 

(rJ/^-Y/c,      a;)  for  type  3, 


where  {Ti,i  >  1}  =  {X!,<i^j'*  —  1}!  l^j)  i^  ^^  i.i.d.  sequence  of  unit-exponential 
variables;  {Xi}  is  an  i.i.d  sequence  with  law  Fx-  In  Model  1,  the  dependence  between 
Ji  and  Xi  naturally  disappears  in  view  of  assumption  l-(i): 

X-c  =  0  for  type  1  tails,  Vz, 

(5.23) 
-Y/c  =  1  for  type  2  &  3  tails,  Vi. 

Theorem  1  (Extreme  Rank  Asymptotics  in  Model  1)  Suppose  Assumption  1  and 
that  (a)  {Yt,Xt}  is  an  i.i.d.    or  stationary  sequence,  satisfying  the  Meyer  conditions, 
Lemma  6;  (b)  at  least  one  component  of  X  is  absolutely  continuous,  if  d  >  2.    Then 
as  tT  -^  k,T  -^  oo,  (k  =  ki, ...,  ki),  for  a.e.  k  >  0 

Zrik)  — >  Zoo{k)  =    arginf    —  kpi'^z  +   I  l{u,x' z)dN{u,x)\, 

where  l{u,  v)  =  l(u  <  v){v  —  u),  and  the  distribution  of  points  {Ji,  Xi]  of  N  is  defined 
in  (5.22)-(5.23).  Furthermore,  {ZT{ki),i  <  l)  -^  [Z^{ki),i  <  l), 

Z^{k)  ^  Z'^ik)  =  Z^{k)  -  c{k), 

and  [Zj-{ki),i  <  I)  — >  [Z^{ki),i  <  I),  where  c{k)  =  Ink  ei,  for  type  1,  —k^~ei,  for 
type  2,  and  k=ei  ,  for  type  3  tails. 
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The  asymptotic  distribution  is  that  of  the  random  variable  Zoo(fc).  The  density  of 
Zoo  can  be  simulated.  Analytical  formulae  for  this  density  are  given  in  Remark  5.1. 
Remark  5.2  explores  the  connections  to  the  classical  results. 

Theorem  2  (Extreme  Rank  Asymptotic  in  Model  2)  Suppose  the  assumptions 
of  Theorem  1,  with  Assumption  2  replacing  Assumption  1.  The  statement  of  Theorem 
1  remains  valid,  with  points  {Ji,  Xi}  of  N  defined  in  (5.22)  and  the  centering  constants 
c(fc)  defined  as  follows:  c(fc)  =  Infcei  +  c,  for  type  1,  —k'^c,  for  type  2,  k=c,  for  type 
3  tails. 

We  stated  the  result  for  Model  2  separately  in  order  to  emphasize  the  model's 
congeniality.  It  exhibits  simplicity,  while  alkwing  flexible  dependence  of  the  tail  on 
covariates.  Both  Models  1  and  2  allow  us  to  fully  characterize  the  limit  process  N, 
which  defines  a  parsimonious  limit  distribution  in  terms  of  only  two  parameters  -  the 
tail  index  ^  and  tail  heterogeneity  index  c  (known  for  Model  1).  The  scaling  constants 
Ut  are  of  the  same  form  for  both  models. 

Note  that  Theorems  1-2  allow  for  weakly  dependent  data  as  well.  The  point  of 
this  paper  (by  far)  is  not  about  dependent  data,  but  since  the  proof  takes  only  an 
additional  half-page,  we  thought  it  shameless  not  to  state  the  result.  The  imposed 
Meyer  conditions  require  strong-mixing  and  no-clustering  of  the  data  sequence.  See 
Lemma  5.  Notably,  because  rare  events  separate  in  time,  all  the  hmits  are  identical  to 
those  of  an  independent  sequence.  This  is  analogous  to  the  results  of  Robinson[66]  on 
kernel  estimation,  where  the  relevant  local  events  are  asymptotically  independent. 

Again,  it  is  not  our  goal  to  dwell  on  technicalities,  but  it  is  reasonable  to  examine 
the  density  of  Zoo-  If  it  is  simple  (it's  not),  it  should  be  very  useful  in  practice. 

Remark  5.1  (Asymptotic  Density)  Let  H  be  the  set  of  all  d-element  permutations 
of  integers  1,2,....  Let  X{h)  and  J{h)  be  the  matrix  with  rows  Xt,t  G  h  and  vector 
with  elements  Jt,t  e  h,  respectively.  {Jt}  are  absolutely  continuous,  conditional  on 
{Xf}.  Conclude,  mimicking  computations  of  the  gradient  and  finite-sample  density 
for  quantiles  in  Koenker  and  Bassett[48]:  a.  An  argmin  of  Qoo  takes  the  form  z  = 
X{h)~^J{h)  (passage  through  d-points)  and  it  is  unique  iff 

oo 

Ch[z)  =  [kpx  -  Y.  1('^'  <  K^)Xi)'X{h)-^  e  D  =  (0, 1)^  (5.24) 

t=i 

and  is  non-unique  if  Cft(^)  G  dV.  If  Xt  has  an  absolutely  continuous  component, 
AetX{h)  is  absolutely  continuous  {detX(h)  is  a  volume  of  the  parallelogram  formed 
by  X{h))  so  that  Cft(z)  G  dV  w.p  0;  b.  Given  (5.24),  the  density  of  Zoo  is 

fz^z)  =  e[Y^  /,,,„,,„  (A'(/i)'z)  ■  |det  X(h)\  ■  PiCniz)  €  V\{Xt},h)], 

where  fjfH)\x{h)  is  the  conditional  on  X{h)  joint  density  of  J{h). 

Remark  5.2  (Relating  to  Classical  Theory)  P{Chiz)  £  'D\{Xt},h)  is  hard  to  ob- 
tain explicitly.  It  simplifies  in  the  no-X  case,  X  =  1,  P  ((h{z)  €  T^\h)  =  1,  if  /i  =  [k], 
0,  if  not.  k  must  be  a  non-integer  for  uniqueness.  Hence  fzad^)  =  /j(|-ti)(^),  which  is 
the  limiting  distribution  of  the  [/c]-th  order  statistics  in  the  i.i.d.  samples. 
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6     Asymptotics  under  Intermediate  Ranks 

The  intermediate  rank  concept  requires  t  — >  0,tT  -^  oo.  In  this  section  we  first  state 
the  results  for  Models  1  and  2,  followed  by  a  brief  explanation. 

6.1  Results  for  Models  1  and  2.  In  addition  to  assumptions  1  and  2,  we  require 
existence  of  density  fu{-\x)  or,  equivalently,  of  dF~^{T\x)/dT  =  x'dp{T)/dT.  This 
density  should  posses  enough  smoothness.  We  also  need  the  conditional  density  tail- 
equivalence,  an  assumption  that  strengthens  the  tail  equivalence  of  the  conditional 
distribution  functions  in  Models  1  and  2. 


Assumption  3  (Density  Conditions)  (i)  In  the  Model  1,  as  t  \0 

dr  dr 
uniformly  in  i  £  X,  and  in  Model  2, 

dF-'{T\x)  dF-\T/K{x)) 

dr  dr 


(6.25) 


(6.26) 


(ii)  dF^  ^  {T)/dT  is  regularly  varying  at  0  with  exponent  — ^  —  1,  cf.  section  3.5  (denote 

5F-HT)/ar  e  7^_5_l;. 

Assumptions  3(i)  and  3(ii)  are  both  constructive  and  general.  Assumption  3(i)  is 
a  stronger  density  analog  of  the  tail  equivalence  conditions  imposed  on  Fu{t\x)  in 
Assumptions  1  and  2.  Assumption  3(ii)  is  an  analytical  smoothness  condition  on  the 
density.  It  was  first  proposed  in  the  non-regression  context  by  Dekkers  and  de  Haan[21], 
who  also  show  that  the  exceptions  among  the  smooth  distributions  are  rare.^ 

Fix  a  reference  index  sequence  {r}  such  that  r  \  0  and  tT  — >  oo.    Consider  I 
sequences  {rZj},i  <  k,  and  define  Zt  =  (aT{li)[P(hr)  -  P{liT)],i  <  k),  where 

ar  (0  =  V^l^i'x  iPimlT)  -  P{It))  , 

for  positive  /  and  m  >  0,  ^  1.  Set  Or  =  aT(OI'=i • 

Theorem  3  (Intermediate  Rank  Asymptotics  in  Model  1)  Suppose  Assumptions 
1  and  3  hold,  and  that  {Yt,  Xt }  is  an  i.i.d.  or  stationary  series,  satisfying  the  conditions 
of  Lemma  9,  then   as  tT  — ^  oo,r  \  0 

a,(/3(r)-/?(r))-AAf(0,V),     V  =  Q^' ^^_f_  i)2 '  (6.27) 

where  Qx  =  EXX'.  Z^-  -^  N{0,n),  Qij  =Vx  mm{l,,lj)/^/lJ].  Furthermore,  ar{l) 
can  be  replaced  by  VtIT/X'  [PimW)  —  P{It))  without  affecting  the  result. 

Remark  6.1  It  may  be  useful  to  have  the  same  normalization  a^-  in  place  of  Cril) 
for  the  joint  convergence  in  distribution.  This  is  possible  by  noting  that  ar/ciTil)  — > 
l-^/Vl.  Then  (arifiihT)  -  /3(/,r)),f  <  n)  ^  N{0,E),  S,,  =  QijiUlj)-^ / y/lj]. 


®To  see  the  plausibility,  take  near-algebraic  and  differentiable  near  finite  or  infinite  lower  end-points 
distributions:  Fu{z)  =  Cz-i/<(lnz)^  as  z  \  0  or  F„(z)  =  C(-z)- Ve(ln[-z])^  as  z  \  -oo,  K  eW. 
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Theorem  4  (Intermediate  Rank  Asymptotics  in  Model  2)  Suppose  Assumptions 
2  and  3  (where  appropriate)  iiold,  and  tiiat  {Yt,Xt}  is  an  i.i.d.    or  stationary  series, 
satisfying  conditions  of  Lemma  9.    The  results  of  Theorem  3  remain  valid,  with  the 
variance  matrix  V  taking  the  form 

e 


Qh  =  E[HiX)]-'^XX',  where  H{x 


(m-«  -1)2' 
x'c  for  type  2  and  3,  and  1  for  type  1  tails. 


Because  the  intermediate  rank  theory  exploits  the  extremality  of  the  relevant  events, 
the  limit  is  defined  only  by  the  tail  parameters.  Unlike  the  extreme  rank  approxima- 
tion, the  condition  relies  on  the  relative  abundance  of  the  relevant  tail  events,  which 
leads  to  normality.  This  produces  a  convenient  theory,  on  which  an  effective  and  prac- 
tical inference  can  be  based,  as  further  discussed  in  Section  7.  Theorems  3  and  4 
create  a  bcisis  for  a  series  of  results  that  give  consistent  estimates  of  the  important  tail 
parameters  (Section  7,  Remark  6.2). 

Lastly,  dependent  data  is  handled  as  well.  Even  though  this  is  not  the  main  focus 
of  the  paper,  the  proof  is  very  short  using  the  local  CLT  of  Robinson[66](see  Lemma 
9).  Notably,  the  resulting  limit  is  the  same  as  for  the  independent  data  (due  to  the 
separation  of  the  tail  events  in  time). 

6.2.  A  Sketch.  This  provides  a  brief  explanation  of  the  result.  The  key  difficult  steps 
are  treated  in  the  appendix.  Lemmas  7-10.  Our  approach  substantively  differs  from 
the  ingenious  proof  of  Dekkers  and  de  Haan[21]  for  the  unconditional  case.  They  use 
the  Renyi  representation  of  order  statistics,  an  approach  that  can  not  be  applied  here. 
The  normalized  statistic  Zt  =  aj-(/3(T)  —  P{t))  minimizes 


Qr(z)  =  ^[Y.Pri^t-X[l3ir) 


Pr{Yt-X^/3{T))). 


We  seek  to  find  its  finite-dimensional  weak  limit  Qooi'),  so  that  by  convexity  we  may 
conclude  argmin  Qt{z) 

Qt(z)  =       "^ 


-I- 


/tT 
1 


— >   argmin  Qoo(z).  Write 

T 

^(r  -  l[Yt  <  XtP{T)])Xiz 
(=1 

T 

Y,Mz)[{yt-XtP{T)]ar-. 


X'tz] 


vv;  z 


+    Gr(z), 


where  p,(z)  =  (l(y,  <  X^P{t))  -  l{Yt  <  X;/?(r)  +  X^z/ar)). 

Lemma  8,  eq.  (C.47),  proves  that  Assumptions  3(i)  and  3(ii)  delicately  imply 

EMz)  =  0{h{F-'{r))-a-').  (6.28) 

This  equality  is  not  self-evident;  in  fact,  it  is  counter-intuitive  since  ar  may  be  con- 
verging to  zero.  Lemma  8,  eq.  (C.52),  also  shows  that  due  to  Assumption  3-(ii) 

m~^ 


fu{F-'{T)) 


F^HmT)-F-'{T) 


1 


< 


(6.29) 
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for  all  TO  >  0,  ^  1,  as  r  \  0.  Then,  by  compactness  of  X, 

Var  Gr{z)  =  O  (^fu{F-'{T))a-A  =  o(l).  (6.30) 

Lemma  9  handles  the  dependence.    Thus  Gt{z)  —  E  Gt{z)  — >  0.  Lemma  8,  using 
Assumptions  3  (i)  and  (ii),  proves  that 


E[Gr{z)]^^z'QHZ- 


m 


-«-l 


< 


=  -z'J{m.)z,  for  any  z,  (6.31) 


where  Qh  is  defined  in  Theorem  4.  For  Model  1,  this  simplifies  Qh  =  Qx- 

Assumption  3  is  instrumental  in  verifying  (6.31)  and  (6.28),  which  is  the  most 

difficult  and  important  part  of  the  proof. 

By  the  Lindeberg  CLT  or  Robinson's  local  CLT  for  dependent  data,  Lemma  9, 

Wr^Woo=N{0,Qx),  (6.32) 

and  Ya.T{Wj.)  — >  Qx  ■  Because  of  the  separation  of  the  tail  events  in  time,  dependence 
ceases  to  matter  in  the  limit.  Notably,  the  Liapunov  and  other  central  limit  theorems 
which  require  strictly  more  than  two  bounded  moments  do  not  apply. 
Therefore,  the  finite- dimensional  weak  limit  of  Qt{')  is 

Qoo(2)  =  W^z  +  ^z'J{m)z.  (6.33) 

Since  Qt  and  Qco  are  convex  and  a.s.  finite,  and  Qoo  is  uniquely  minimized  at  Z^o  = 
—  J~^{Tn)  Woo  =  Op{l),  we  conclude  Zj.  — >  Z^  by  convexity  Lemma  1.  The  joint 
convergence  follows  similarly  by  considering  a  sum  of  scaled  objective  functions,  and 
proceeding  as  above. 

Lastly,  the  scaling  a^  can  be  replaced  by  its  empirical  analog: 


X'(/?(mT)  - /3(t))   _  X'{fi{jnT)-^{mT)) 
lJ,'x{P{mT)  -  /3(r))  -    ii'xiPimr)  -  ^(r)) 


X'(^(r)  - /3(r))  X'(/?(mr)  -  ^(t))      „ 

~'      .  /    /ol \  oi^w  '  -'■1 


(6.34) 


^i'^  {0{mT)  -  P{t))       /x'^  {P{mT)  -  /3(r)) 


since  the  first  two  elements  on  the  r.h.s.  are  of  order  Op{-^=^)  =  Op(l). 

The  last  display  states  that  a  population  quantile  spacing  can  be  replaced  by  its 
empirical  analog,  which  is  remarkably  useful  for  inference  purposes  (Section  7).  This 
property  is  unexpected  at  a  first  sight,  see  Remark  6.2. 

Remark  6.2  (Empirical  Regression  Quantile  Spacings.)  Note  that  (6.34)  does 
not  follow  from  the  convergence  of  P{t)  to  P{t)  under  intermediate  ranks,  because 
l3(mT)  —  /3{t)  may  be  converging  to  0  or  diverging  to  infinity,  as  r  \  0.  Furthermore, 
when  y/rT I n'^[[i{mT)  —  /3(r))  — >  0,  /3(t)  does  not  converge  to  /3(t).  Indeed,  take  F^ 
in  Assumption  1  to  be  of  type  2.  Then  ^'xiPimr)  -  /?(t))  ~  t~=C{t),  for  some  slowly 
varying  function  L  at  0.  Then  if  {tt]  satisfies  r  =  cT~^,  X  £  (0, 1),  and  if  r~j  >  a  >  0, 
then  VrT/fi'yr  (P{mT)  —  /3(t))  — ^  0.  That  is,  divergence  arises  when  r  ^  0  fast,  and 
the  tails  are  sufficiently  thick.  Notably,  (6.34)  remains  valid. 
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7     Inference 

The  focus  of  this  paper  is  the  modehng  and  distribution  theory  for  the  extremal  (high 
and  low)  regression  quantiles.  It  is  also  desirable  (but  not  feasible)  to  address  all 
practical  issues  related  to  confidence  intervals,  hypothesis  testing,  and  tail  inference 
within  the  present  paper.  These  questions  are  partly  addressed  in  [13],  and  we  hope  to 
pursue  them  in  future  work.  Here,  we  present  a  brief  discussion  of  these  issues. 

7.1  Quantile  Spacings  and  Tail  Inference.    Estimation  of  the  tail  index  is  an 
important  problem  in  the  statistics  of  extreme  values,  as  discussed  in  Example  2.3. 
The  tail  parameters  also  enter  the  limit  distributions  obtained  earlier.   The  following 
results  show  how  to  estimate  them  by  the  sample  regression  quantile  spacings. 
Consider  the  following  statistics 

_x'0{mT)-0{T))  _  x'iPjmlT)  -  /?(/t))      .         _  x'ipjmlr)  -  pjlr)) 

'^  -  x'iPimr)  -  /3(t))  '  ^^•'•'  "   ±'{p{mT)  -  P{t))  '  ^^■'•'  "    x'(/3(mT)  -  ^(r))  " 

Theorem  5  (Regression  Quantile  Spacings  and  Tail  Inference)   Under  the 
assumptions  of  Theorem  3  or  4,  as  t  \  0,  tT  -^  oo,  V/,  m  >  0,  m  ^  1,  x,  x  G  X 

(i)       ^  -^  1, 
(ii)       Px,i,i  -  Px,x,i  -^  0,  /9x,i,;  -^  l~^  ■  [H{x)/H{x)]  (cf.  Thm  4).  In  particular, 

0")       ^^  i^lnP.xx,  ^  C, 
(jv)       p^  Y  1  — >■  x'c,  uniformly  in  x       ((,  ^  Oj. 
(v)       for  TT  =  li'^  Qjj^  Qx  QJi^Hx  ,l  =  m  =  2,if  VtT{Px,x,i  -  limr  Px,x,i)  -^  0, 

^^(^-^^-^^T'"-  (2(2^-1)  In  2)^ 

Theorem  5  is  a  simple  corollary  of  Theorems  3,4  and  Lemma  10.  (i)  is  by  the 
same  steps  as  equation  (6.34),  and  (i)  implies  (ii)-(iv),  using  the  properties  (v)-(vi)  in 
Lemma  10.  Uniformity  in  x  in  (iv)  follows  from  the  linearity  of  Px^x,\-  (^)  follows 
from  Theorems  3  and  4  by  the  delta  method.  The  results  can  be  strengthened  to  the 
uniform  convergence  in  /,  m,  x  (Chernozhukov[13]). 

Theorem  5  shows  that  the  regression  quantiles  spacings  of  the  intermediate  ranks 
consistently  approximate  the  population  spacings  (result  (i)  and  (ii))  which  reveal  the 
tail  indices  (results  (iii)  and  (iv)). 

Results  (iii)  and  (v)  may  be  especially  emphasized,  because  the  inference  concerning 
the  tail  index  is  one  of  the  most  important  problems  in  the  statistics  of  extreme  values, 
[26].  The  proposed  estimator  ^  is  a  regression  generalization  of  Pickand's  estimator,  if 
I  =  m  =  2}^^  ^  consistently  estimates  the  tail  index  ^  in  the  heteroscedastic  regression 
Models  1  and  2.  In  fact,  since  ^i'xE{XX')''^ px  =  1  (normalize  ^x  =  (1,0, ...)),  in 
'"Going  back  to  Pickands,  such  a  choice  is  due  to  practical  reasons  but  may  vary  in  appHcations. 
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Model  1  77  =  1,  so  that  variance  equals  that  of  Pickand's  estimator  in  the  setting  with 
no  regressors,  [26]: 

^2(22^  +  1  ^1) 

(2(2«-  1)  In  2)2' 

Thus,  unlike  Pickand's  and  other  unconditional  estimators,  ^  specifically  adapts  to 
the  presence  of  covariates  that  affect  the  scale  and  shape  of  the  conditional  density.  And 
even  when  covariates  do  not  matter,  there  is  no  efficiency  loss  in  using  our  estimator 
rather  than  that  of  Pickands. 

7.2  Confidence  Intervals.  We  offer  only  a  brief  discussion,  whereas  details  and  an 
empirical  application  can  be  found  in  [13]  and  [15],  respectively.  For  brevity  (and  all 
practical  purposes)  assume  that  the  tail  index  ^  7^  0. 

Resampling.  Subsampling,  Romano  et  al.[58]  ,  is  a  simple,  practical  way  of  con- 
structing the  confidence  intervals.  The  validity  of  subsampling  is  not  immediate  in  our 
setting,  since  our  statistics  may  be  diverging.  However,  a  simple  modification  brings 
its  validity,  Chernozhukov[13].  Another,  different  modification  was  proposed  by  Bertail 
et  al.[8]  for  the  ordinary  sample  quantiles  and  can  be  adapted  here.  In  the  empirical 
work,  subsampling  generates  well-behaved,  sensible  confidence  intervals  ([15],  [8]). 

The  nonparametric  bootstrap  fails  in  the  extreme  rank  case.  (A  well  known  counter- 
example is  that  of  extreme  order  statistics.)  For  the  intermediate  rank  case,  the  boot- 
strap may  work,  at  least  when  Oj-  — >  00,  since  the  statistic  of  interest  is  approximately 
an  average.  However,  a  simple  bootstrap  is  unlikely  to  offer  intervals  of  good  quality, 
so  that  a  smoothing  as  in  Horowitz[43]  may  be  needed.  Recently  Bickel  and  Sakov[69] 
have  shown  that  for  the  case  of  sample  median  subsampling  (with  replacement)  does 
better  than  the  simple  bootstrap.  This  might  carry  over  to  the  intermediate  rank  cases. 

Analytical  Confidence  Intervals:  Intermediate  Case.  In  the  intermediate 
rank  theory,  the  confidence  intervals  are  simple  to  obtain.  Theorem  5  provides  the 
estimators  for  the  tail  index  ^  and  the  index  x'c.  The  scaling  constant  a-j-  can  be 
replaced  by  its  empirical  analog,  see  Theorems  3  and  4.  This  fully  operationalizes  the 
intervals.  The  simplicity,  convenience,  and  parsimony  of  the  limit  make  it  a  significant 
competitor  of  the  central  rank  theory  for  quantiles  in  the  range  up  to  .25-. 3,  across 
common  data  sets.  Chernozhukov[13]  offers  a  monte-carlo  confirmation,  employing 
designs  with  different  tail  types,  continuous  and  discrete  covariates.  These  intervals 
out-perform  the  central  ones  (employing  the  methods  in  [50],  built  in  S-I-). 

Analytical  Confidence  Intervals:  Extreme  Rank  Case.  In  this  case  the  confi- 
dence intervals  are  involved  but  worth  the  trouble  (Figures  2  and  5).  First,  approximate 
the  distribution  of  N.  N  is  a  Poisson  Process,  so  its  Laplace  functional  is 


*n(s)  =  -Eexp 


g{u,x)d'N{u,x)    =  exp    -   /  (1  -  e   ^^'''^^)dm^^c,F^{u,x) 


for  measurable,  continuous  functions  g,  vanishing  outside  the  compacts  sets,  where 
m^_c,Fv(^)  =  -E'[N(>1)|^,  c,Fx]  is  the  intensity  measure  of  N,  defined  in  Lemma  6. 
\Pn(3)  is  a  continuous  function  of  ^,c,Fx-  Thus  the  distribution  of  J  gdN  can  be 
consistently  estimated  by  that  of  J  gdNs  ^  ^  . ,  where  N;  ^  ^  is  the  Poisson  process  with 
the  intensity  measure  tu;  ^  ^    .  The  infinite  sum  J  gdN;  ^  ^    can  be  approximated  by  a 
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finite  sum,  so  that  the  distribution  of  Z^  can  be  obtained  by  monte-carlo/'  as  in  Fig.  5. 
To  estimate  the  scaling  Oj.,  we  can  rely  on  the  unconditional  case,  [26],  [8].  Employing 
the  approximation  //'^(/^(mr)  —  /3(t))  ~  c{rn~^  —  1)t~^,  project  X'{p{mT)  —  /3(r))  on 
c{m^^  —  \)t~^  for  different  r  and  m  to  get  c,  and  set  a-r  =  c(l/T)^^.  If  tT  grows 
polynomially  fast,  ax  far  — >  1.  [This  should  suffice  in  most  practical  cases;  in  very 
large  samples,  we  may  further  refine  this  as  r{Tn,T)  ~  C{m~^  —  l)r~^(— Inr)^  etc.] 

7.3.  Which  theory?  Let  us  consider  the  examples  in  Figures  4  and  5,  and  look 
at  the  following  factors:  (i)  the  number  of  regressors,  (ii)  the  theoretical  quality  of 
approximations,  and  (iii)  the  convenience  and  ease  of  estimating  nuisance  parameters. 

First  of  all,  covariates  reduce  the  effective  sample  size.  To  that  end,  the  concept  of 
the  effective  rank  is  useful.  The  effective  rank,  r,  is  the  ratio  of  the  rank  to  the  number 
of  regressors,  tT /d.}^  To  motivate  this,  consider  a  simple  "regression"  quantile  problem 
in  a  sample  of  1000  observations  and  10  dummy  regressors,  in  which  the  target  is  the 
.2-th  conditional  quantile  function.  The  estimate  is  the  20-th  lowest  order  statistics 
in  each  of  10  subsamples  corresponding  to  the  dummy  variables.  Figure  2  (A-C), 
corresponding  to  this  example,  suggests  the  normal  approximations  are  much  worse 
than  the  extreme  rank  one.  So  if  r  is  less  than  or  equal  to  25  —  40,  the  Figures  2  and 
5  (A-C)  prefer  the  extreme  rank  approximation  for  the  quality  reasons. 

When  the  effective  rank  r  is  above  25  —  40,  the  normal  intermediate  or  central  rank 
approximations  appear  sensible.  Irrespective  of  the  sample  size,  the  intermediate  rank 
theory  (in  principle)  should  not  be  useful  for  the  central  quantiles  (our  modest  examples 
suggest  the  range  (.3,. 7)).  However,  irrespective  of  the  sample  size,  the  intermediate 
rank  theory  is  more  useful  for  the  high  and  low  conditional  quantiles  (r  <  .3,  r  >  .7) 
because  of  the  simplicity  of  estimating  nuisance  parameters. 

Because  the  intermediate  rank  theory  exploits  the  extremality  of  the  relevant  events, 
it  provides  an  approximation  conveniently  defined  by  the  tail  parameters  (in  contrast, 
the  conventional  theory  requires  the  nonparametric  conditional  density  function  eval- 
uated at  the  high  or  low  quantile,  which  is  hardly  estimable  with  the  scarce  tail  data 
and  many  covariates).  Of  course,  if  we  fix  r  and  let  the  sample  size  go  to  infinity 
the  central  rank  theory  will  dominate  quality-wise,  but  the  theoretical  gain  should  be 
very  small  (Figure  5,  D-F).  In  summary,  we  believe  that  the  intermediate  rank  theory 
does  better  that  the  conventional  theory  at  offering  a  more  qualitative  and  practical 
approach  to  making  inference  on  high  and  low  conditional  quantiles. 

7.4  Other  Results.  In  [13],  we  have  further  explored  the  asymptotic  questions  by 
looking  at  the  empirical  processes  of  the  form  (o7.(/3(r/)  —  /3(r/),Z  G  £1,  C  C  (0,oo). 
The  convergence  to  either  Gaussian  or  non-Gaussian  processes  have  been  demonstrated. 
These  results  have  many  practical  applications  in  estimating  tail  parameters.  Some 
hypothesis  testing  and  refinements  of  tail  estimators  in  Theorem  5  are  also  explored. 

''To  obtain  the  theoretical  approximation,  (a)  simulate  (Ji,Xi),i  <  n\  where  Xi  are  drawn  from 
Fx ,  Ji  are  simulated  as  defined  in  Theorems  1  and  2.  (b)  solve  Zoo.n  =  a^rgmin^"_j  pf;/^{J^  —  X'-z); 
repeat  (a)  b  times.  6  and  n  should  be  large.  In  practice,  A',  may  be  drawn  from  the  empirical  dis-n 
f-n  Fx,  and  J,  are  drawn  as  defined  in  Theorems  1  and  2,  replacing  ^.A'^'c  with  l^,X'^c. 

'■^A  more  refined  version  may  be  TTId  times  the  determinant  of  the  correlation  matrix  of  A'.  If 
covariates  are  independent,  this  will  give  tT. 
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Conclusion 

The  present  work  provides  a  theoretical  framework  for  studying  the  conditional  ex- 
tremal response  -  the  near-extreme  conditional  quantile  functions.  We  developed  the 
models  that  coherently  combine  the  linear  forms  with  flexible  heteroscedasticity  and 
extreme- value-theoretic  restrictions.  We  suggested  and  motivated  the  concepts  of  ex- 
tremality  or  data  scarcity  for  regression  quantiles  -  the  intermediate  and  extreme  rank 
sequences.  We  obtained  the  limit  distributions  under  each  of  these  sequences. 

The  numerical  examples  in  Figures  2  and  5  suggest  that  these  distributions  approx- 
imate the  finite-sample  distributions  better  or  as  well  as  the  conventional  theory,  for 
quantiles  in  the  range  .01  -  .3  (in  samples  of  common  size).  These  distributions  are 
conveniently  determined  only  by  tail  parameters.  These  tail  parameters  are  easy  to 
estimate  (unlike  nonparametric  conditional  density  evaluated  at  near-extreme  quan- 
tiles, required  in  the  conventional  theory.)  We  also  provided  the  tail  estimators  which, 
unlike  the  widely-used  Pickands  and  other  classical  procedures,  specifically  adapt  to 
the  setting  where  covariates  affect  location,  scale,  and  shape  of  conditional  density. 

The  relevance  of  these  results  stems  from  both  the  motivation  for  quantile  regression 
models  in  data  analysis  and  the  importance  of  tail  inference.  The  motivation  was  to 
explore  many  more  features  of  the  conditional  distributions  than  just  the  center.  For 
example,  Abreveya[2]  and  Koenker  and  Hallock[50]  characterize  the  economic  determi- 
nants of  babies'  very  low  birth-weights  through  the  near-extreme  conditional  quantiles 
(.05  and  below).  Deaton[20]  examines  the  food  expenditure  of  Pakistani  households 
by  the  .1-th  and  .9  -th  conditional  quantiles.  In  our  work,  [15],  we  study  the  economic 
determinants  of  very  high  risk  of  an  oil-producer's  stock  price.  We  find  that  the  market 
factor  is  an  unambiguously  strong  determinant,  whereas  other  factors  are  not.  Thus 
the  level  of  extreme  risks  are  mainly  determined  by  the  general  economic  activity.  We 
also  find  and  characterize  the  tail  thickness  of  the  conditional  distribution,  using  the 
procedures  developed  here.  Presently,  Chernozhukov  and  Hong[14]  are  considering  the 
robust  estimation  of  the  auction  models,  discussed  in  Example  2.4. 

We  shall  (and  hope  others  will)  further  address  the  inference  questions,  such  as 
confidence  intervals,  hypothesis  testing,  and  tail  inference  procedures.  Over  more  than 
fifty  years,  an  elaborate  theory  of  inference  based  on  the  ordinary  sample  quantile  has 
been  developed,  [26],  and  now  forms  an  essence  of  the  extreme  value  theory.  Further 
progress  is  possible  by  building  on  these  developments. 
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Figure  5:  A  simulation  example  for  a  classical  model:  Y  =  X'a  +  U,  U  has  the  algebraic 
tail  F{u)  ~  u~^'^  as  u  \  oo,  ^  =  1.  X],X2  are  symmetric  Beta  variables  (normal  looking, 
but  with  bounded  support),  X3  and  X^  are  dummy  variables.  The  results  are  for  one  of 
coefficients  (others  are  similar).  T=500.  Replications  =  5,000.  Displays  A-C:  QQ-plot  of 
Extreme  and  Central  Rank  Approximations.    The  dashed  line  "-    -    -"  is  the  central 

approximation,  and  the  dotted  line   " "    is  the  extreme  rank  approximation.     The  true 

quantiles  of  the  exact  sampling  distribution  are  depicted  by  the  solid  line  " " .  The  central 

rank  approximation  varies  from  very  bad  to  bad  for  low  quantiles  r  =  .025  and  r  =  .1 
and  becomes  comparable  to  the  extreme  rank  approximation  only  at  r  =  .2.  Displays  D- 
F:   QQ-plot  of  Intermediate  and   Central  Rank  Approximations.    The  dotted  line 

" "  is  now  the  intermediate  rank  approximation.  The  theoretical  central  and  intermediate 

rank  approximations  have  approximately  the  same  performance  for  r  =  .1,  .2,  .3,  (using  m  = 
2, 1.5, 1.25).  The  practical  advantage  of  the  intermediate  rank  is  the  parsimony  and  ease  of 
estimating  nuisance  parameters.  [  QQ  plots  are  over  the  99%  range.] 
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APPENDIX.  The  appendix  gives  proofs  for  Theorems  1-4  in  the  text  (Corollciry  1  and  2), 
and  studies  Models  1  and  2  in  detail.  It  first  develops  a  set  of  simple  "high-level"  conditions, 
that  give  the  requisite  convergence  results.  These  conditions  are  verified  for  Models  1  and 
2,  leading  to  fairly  compact  proofs  of  Theorems  1-4.  In  addition,  these  conditions  enable 
an  extension  of  the  applicability  domain  beyond  Models  1  and  2,  by  only  supplying  the  key 
convergence  results  (e.g.  CLTs  etc).  We  also  include  some  background  material. 

In  what  follows,  {yfjXf}  is  a  (triangular)  sequence  of  random  variables  taking  values  in 
R'  X  R''  and  defined  on  probability  space  {^,T,P),  possibly  indexed  by  T.  The  outer  P*  and 
inner  P.  probability  measures  are  defined  in  [76].  Q-mixing  or  strong  mixing  is  defined  e.g. 
in[18].  Use  Fu{z\x)  to  denote  Fu,{z\Xt  =  x)  in  the  sequel.  Unless  otherwise  stated,  k,  K,  C 
and  their  modifications  are  generic  constants. 

A     Useful  background 
A.l      Point  processes 

These  definitions,  collected  for  the  reader's  convenience,  may  be  found  in  [65]  and  [52]. 

Definition  A.l  (Point  Measures,  Mp(E))  Let  £  be  a  locally  compact  topological  space 
with  a  countable  basis.  Define  £  to  be  the  Borel  cr-algebra  of  subsets  of  E.  A  point  measure 
(p.m.)  p  on  {E,£)  is  a  measure  of  the  foUowdng  form:  for  {xi,i  >  1},  a  countable  collection 
of  points  (called  points  of  p),  and  any  set  A^  £: 

p{A)  =  Y^l(x,eA). 

i 

If  p{K)  <  cx),  for  any  K  C  E  compact,  then  p  is  said  to  be  Radon.  A  p.m.  p  is  simple  if 
p{x)  <  1  yx  €.  E,  and  is  compound  otherwise.  Let  Mp{E)  be  the  collection  of  all  Radon 
point  measures.  Sequence  {pn}  C  Mp{E)  converges  vaguely  to  p,  if  J  fdp^  — )■  J  fdp  for  all 
functions  /  €  C^{E)  [continuous,  non-negative,  and  vanishing  outside  a  compact  set]  (cf. 
Leadbetter  et.al.[52]).  Vague  convergence  induces  vague  topology  on  Mp{E).  The  topological 
space  Mp{E)  is  metrizable  as  a  complete  separable  metric  space.  Mp{E)  denotes  such  a  metric 
space  hereafter.  Define  Mp{E)  to  be  the  cr-algebra  generated  by  the  open  sets. 

Definition  A. 2  (Point  Processes:  Convergence  in  Distribution.)  A  point  process 
(PP)  in  Mp{E)  is  a  measurable  map 

^■.{^,T,P)^{Mp{E),Mp{E)), 

i.e.  for  every  elementary  event  u)  G  ft,  the  realization  of  the  point  process  N(u;)  is  some 
point  measure  in  Mp[E).  Thus,  the  concept  of  convergence  in  distribution  (in  law,  weak 
convergence)  of  the  point  process  Nn  taking  values  in  Mp{E)  is  the  same  as  for  any  metric 
space,  cf.  [65]:  we  shall  write 

N„  =>  N  in  Mp{E) 

if  Eph{^r,)  ->  Eph{^)  [i.e.  /^^ /i(N„(u;))dP(u;)  — >  J^h(N{io))dP{Lo)  ]  for  all  continuous 
and  bounded  functions  h  mapping  Mp(E)  (or  M+{E))  to  R.  This  implies  that  if  Nn  =>  N  in 
Mp{E), 

f    f{x)dNn(x)   -^     /    /(x)dN(x) 
J  E  J  E 

for  any  /  G  Ck{E)  by  the  continuous  mapping  theorem. 


Definition  A. 3   (Poisson  Point  Process  or  Random  Measure  (PRM))  Point  process 
N  is  a  PRM  in  Mp{E)  with  mean  intensity  measure  m  defined  on  {E,£),  if 
(a)  for  any  F  &  £  and  any  non-negative  integer  k 

p(N(^)  ^k)  =  l   «"'"'^'"^(^)Vfc!       if  m{F)  <  <x,, 
[0  if  m{F)  =  oo, 

(b)  for  any  A:  >  1,  if  (Fj,i  <  k)  are  disjoint  sets  in  £,  (N(Fi),j  <  k)  are  independent. 

Definition  A. 4  (Compositions  and  Transformation  of  PRM)  To.  construct  our  limit 
processes,  the  following  are  helpful  (see  Proposition  3.7  and  3.8  in  Resnick[65].) 

1.  (Canonical  PRM)  The  PP  with  points  {r,,j^  1}  in  Mp{E),  where  E  =  [0,oo),  F,  = 
S,<i  ^j>  {^i}  3-re  iid.  unit  exponential,  is  PRM  with  mean  measure  m{du)  =  du  on  {E,£). 

2.  Let  {Vi,i  >  1}  be  i.i.d.  random  variables  with  law  Fv,  taking  values  in  (5,5),  satisfying 
definition  A.l,  then  the  PP  with  points  {5,,  V,,i  >  1}  is  PRM  in  Mp{E')  with  mean  measure 
m{du,  dv)  =  du  x  Fv  (dv)  on  (£',  £')  =  {E  x  S,£  x  S). 

3.  Let  Ni  be  a  PRM  in  Mp(£'i)  with  points  {Gi,i  >  1}  and  mean  measure  mi  on  {Ei,£i). 
Then  the  PP  N2  with  points  defined  by  {T(g^),i  >  1},  where  T  :  {Ei,£t,)  i->  (£2,^2) 
is  measurable,  is  PRM  in  Mp{E2)  with  mean  measure  m{dg)  =  mi  o  T~^{dg)  defined  on 

(E2,£2). 

A. 2     Convex  Semi-Continuous  Objectives 

The  following  result  is  from  Knight[46].  It  allows  for  general  discontinuities  and  R  -  valued 
objective  functions.  The  result  is  embedded  by  Knight[46]  into  the  framework  of  stochastic 
equi-sem,i- continuity  of  the  objective  functions,  which  gives  an  elegant  way  of  transforming  the 
weak  finite-dimensional  (fidi-)  convergence  of  objective  functions  into  the  weak  convergence 
of  argmins,  provided  the  sequence  of  agrmins  is  Op{l).  In  case  of  convexity  one  has  s.e.-sc. 
Related  literature  is  [67],  [70],  [22]. 

Lemma  1  (  Knight[46],  p. 12  )  Suppose  {Qt}  is  a  sequence  of  lower-semi-continuous  (Isc) 
convex  R-vaJued  random  functions,  defined  on  R  ,  and  let  V  be  a  countable  dense  subset  of 
R  .  If  Qt  fidi- converges  to  Qoo  in  R  on  V  where  Qoo  is  Isc  convex  and  finite  on  an  open 
non-empty  set  a.s.,  then    argmin  Qt(z)  — >   axgmin  Qoo{z),  provided  the  latter  is  uniquely 

defined  a.s.  in  R"* . 


B     Proofs  for  section  5,  extreme  ranks 
B.l      Details  and  definitions 

Definition  B.l  (Key  Point  Process,  Space  E)   The  key  PP  in  Mp{E)  is 

N()  =  ^  liariUt  -  6t),  A',)  G  •};  (B.35) 


The  mis-en-scenes  1-3  define  (1)  the  mecisurable  spaces  {E,£),  (2)  the  reference  error  Ut,  (3) 
restrictions  on  the  constants  aT,bT,  and  (4)  the  rescaled  estimators 

2.E2=  [[-cx,,(»]\{0}] 

xX,    Ut=Yt-X'tl3r, 

ZT  =  ar{P{r)-pr), 

br  =0,  ar  >  0. 


1.  El  =  [—00,00)  X  X, 

Ut=Yt-X't^r, 

-6x61), ar  >  0; 


3.  E3  =  [0,  00)  X  X, 
Ut=Y,~  X[p.  >  0, 

br  =0,   Or  >  0. 


where  /3r  =  /3(0)  in  the  finite  support  case;  ei  =  (1,  0, ...,  0)';  X  is  a  compact  subset  of  R"*  s.t. 
Jft  6  X  Vi.  CT-algebra  £  on  E  \s  generated  by  the  opens  sets  of  E. 

Mis-en-scene  1,  2,  3  suit  the  case  of  Fut{-\Xt)  with  type  1  tails  (finite  and  infinite  support 
CEises),  type  2  tails  (infinite  support),  type  3  tails  (finite  support),  respectively.  The  scaling 
constants  (ot,6t)  for  Models  1  and  2  are  in  the  main  text,  section  3.6.  They  conceivably 
differ  in  other  cases. 

Remark  B.l  (Compactification)  The  choice  of  E\,  E2,  E3,  that  is  their  topology,  is  impor- 
tant and  simplifies  the  proofs  considerably.  We  assume  that  the  topology  on  E2  and  E\  is  in- 
duced via  a  standard  two-  and  one-point  compactification  respectively  (so  that  e.g.  [—00,  a]  x  X 
is  compact  in  E2  for  o  <  0  and  m  E\  for  any  a  <  00.) 

Definition  B.2  (Limit  Point  Process.)   We  require  that  N  =>  N  in  Mp{Ei), 

00 

N()  =  ^l{(J„;f.)6},  (B.37) 

1=1 

where  {Ji,Xi}  are  random  vectors  in  Et  [i  =  1,2,3),  finitely-valued  a.s. 

Definition  B.3  (Normalized  Statistics  and  Limit.)  Given  the  definition  of  the  quantile 
regression  estimator,  the  rescaled  statistic  Zt  in  (B.36)  solves; 

T 

Zt  =   argmin  {QT{z,k)  =  ^p.r{aT{Ut  -  br)  -  X'tz)}  (B.38) 

[write  either  z  =  ariP  —  Pr  —  brSi)  or  z  =  ariP  —  Pr)]  The  weak  hmit  Z  will  solve 

Z=   axgmin  {^Qcxi{z,k)  =  —knxz  +  j    l{u,x' z)dN{j,x)},  (B.39) 

where  l{u,v)  =  l(w  <  v){v  —  u),  nx  =  plim  X.  Since  N  £  Mp(Ei),  Qoo{-,k)  is  well  defined 
and  finite  on  an  open  non-empty  subset  of  R  ,  under  the  conditions  stated  below. 

Definition  B.4  (Essential  Uniqueness.)  Let  IC  =  {K\,K2)  s.t.  k  £  IC.  Given  w  €  Q, 
let  JCb{w),  the  break-points,  be  the  set  oi  ki,  €^  IC  s.t.  argmin.g^j  (3oo(2,  fc(,)  is  not  unique  in 
R''.  For  any  u;,  ICb{w)  /  0  due  to  the  piece-wise  linear  form  of  the  objective  [see  gradient 
conditions  in  Remark  5.1).  We  require  the  essential  uniqueness:  either  (i)  k  ^  Kb  w.p.  1 
or  (ii)  Lebesgue  measure  of  K.b  is  zero,  w.p.l. 
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B.2     Basic  conditions  for  nondegenerate  asymptotics 

The  BC  axe  formulated  as  general  but  constructive  conditions  that  (i)  lead  to  short  proofs  of 
Theorems  1  and  2,  (ii)  allow  to  extend  the  applicability  beyond  Models  1  and  2,  by  supplying 
only  the  requisite  limit  laws,  such  as  the  convergence  of  a  point  process. 

Condition  1  BC.l  (Key  Point  Process)  There  exist  constants  {ar^br}  and  Ei  of  the  forms 
in  (B.36)  s.t.  point  process  N,  defined  in  (B.35),  converges  weakly  to  N  in  Mp{Ei), 
N  =>  N.  (For  E2  we  only  require  N(-  fl  E'2)  ^  N(-  n  E2)  in  MpiE'^),  where  E'2  = 
[-00,0)  X  X).  Points  of  N  are  R''+'-vaiued  a.s. 

BC.2  (Stability  and  Compactness)  {Xt},  {Xi}  have  the  support  contained  in  a  compact  set 
X  C  R''.  The  non-degenerate  limit  empirical  distribution  functions  of  {Xt}  and  {X,}, 
Fx  and  Fx ,  exist  with  support  in  X.  jFx  has  mean  px  ■ 

BC.3  Design  Conditions:  (1)  Essential  uniqueness  holds.  (2)  For  space  E2,  Z'j-x  <  0  and 
Z' X  <  0,  Vi  £  X,  w.p.  —>  1. 

BC.4  For  appropriate  Ej,  one  of  the  following  is  true  (for  c(k)  £  R''):  (i)  ariPir)  —  (3r  — 
brei)  -^  c{k),  (ii)  ariPir)  -  (3r)  ^  c(fc),  (Hi)  ar{l3{r)  -  /?(0))  ^  c(fc). 

Remark  B.2  1.  BC.l  is  the  key.  We  verify  it  for  Models  1  and  2  under  dependence  conditions 
ruling  out  the  clusters  of  extremes.  By  supplying  further  convergence  results,  one  can  extend 
the  applicability  to  a  bigger  variety  of  data  (e.g.  panels).  2.  BC.2  requires  {^Yt},  the  properly 
scaled  regressors,  and  {Xt}  to  have  basic  stability  properties.  Limit  empirical  distribution 
functions  are  known  to  exist  under  general  conditions.  The  compactness  condition  is  required 
for  establishing  the  continuity  of  the  mapping  of  N  to  the  rescaled  statistic.  Relaxing  this 
condition  may  alter  the  limits.  BC.2  allows  trends,  e.g.  if  Xt  =  t/T,  Xt  has  the  support  in 
[0, 1],  and  the  limiting  empirical  distribution  is  uniform.  3.  Condition  BC.3,  given  BC.2,  is 
plausible  (see  section  E).  4.  Condition  BC.3(2)  is  automatic  in  the  main  text  but  probably 
may  not  hold  more  generally.  It  requires  that  for  space  E2,  suiting  type  2  tails,  the  reference 
line  (w.l.g.  the  line  can  be  chosen  to  be  above  median),  is  below  or  equal  to  the  low  rank 
sample  regression  quantiles  in  the  compact  set  X  with  arbitrarily  small  probability  as  T  -^  00. 
5.  BC.4;  For  Models  1  and  2,  the  constants  c{k)  are  stated  in  Theorems  1  and  2  and  derived 
in  Lemma  10. 

Lemma  2  (Weak  Convergence  under  Extreme  Ranks)  BC.1-3  imply  as  tT  — >  fc,r  — > 
00  (for  appropriate  space  E) 

Zt  — >  Z  =   argmin  <  Qoo(z)  =  -kp'xz  +  /    l(u,x' z)dN(u,x)  >, 
zeK''  J  E  ' 

for  almost  every  k,  where  l{u,v)  =  l(u  <  v)(v  —  u)  and  N  is  defined  in  (B.37).  This  result 
could  be  stated  in  terms  of  the  centered  statistics.  If  BC.4  also  holds,  then 

forZ^  =  aT0{T)-l3{T)),      Z^t  ^  Z^  =  Z  -  c{k). 

Proof:  (1)  Because  tT  — >  k,  take  r  =  k/T  w.l.o.g.  in  sequel.  First,  stabilize  QT(z,k)  in 
(B.38)  by  subtracting  a  term  that  does  not  affect  optimization: 


QT(z,k)  =  -kX'z  +  Y,h(arl,Ut  -  6t),Xz), 
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where  li{u,v)  =  l(u  <  v){v  —  u)  —  l{u  <  —5){  —  S  —  u),  for  5  >  0.  Such  a  renormalization 
makes  the  proof  short,  since  Qt  now  becomes  a  (to  be  shown)  continuous  functional  of  N: 

QT{z,k)  =  -kX'z+  f    kiJ,x'z)dN{j,x).  (B.40) 

Part  (2)  shows  that  the  fidi  weak  hmit  of  Qt  is  a  convex  function  in  z: 

Q^{z,k)  =  ~kfi'xz+         lsU,x'z)dN{j,x).  (B.41) 

Qt{z,  k)  is  convex  VT,  since  it  is  a  sum  of  convex  functions  in  z.  Qt{-,  k)  is  continuous,  hence 
l.s.c.  VT.  By  the  convexity  Lemma  1,  for  a.e.  A:  >  0 

Zt  ^  Z  =   aigmmQoo{z,k),  (B.42) 

provided:  (a)  there  is  a  non-empty  open  set  Zo  s.t.  Qo:,{z,k)  is  finite  a.s.  ^z  G  Zq,  and  (b)  Z 
exists  and  is  a.s.  unique  for  a.e.  k  >  0.  (a)  is  shown  in  peirt  2  of  this  proof,  (b)  simply  follows 
from  BC.3(1),  since  Z  =  argmin  ^Qoo{z,k),  for  Qoo  defined  in  (B.39).  Indeed,  Qoo{z,k) 
differs  from  Qoc{z,  fc)  by  A  =  f^  l{j  <  -S){-S  -  j)dN{j,  x)  =  T,i:j.<-6(-^  -  -^0,  which  is 
independent  of  z  and  |A|  <  cxd  a.s.  '* 

(2)  Here  we  verify  that 

(i)  Qao{-,k)  is  indeed  a  fidi  distributional  limit, 

(ii)  there  is  an  open  non-empty  set  Zo  s.t.  Qoa{z,k)  is  finite  a.s.  for  all  z  £  Zo- 
(i)-  Qoo{-,k)  is  a  weak  fidi  limit  of  {Qr(-,fc)}  iff  for  any  finite  collection  {zj,j  <  I) 

ipTiz^k),  j<i)-u  (Qoo{zj,k),  j<iy 

Since  X' Zj  — >  n'xZi,  we  only  need  to  verify: 

(^  j    ls{u,x'zj)dN{u,x),  j  <l^  -^  n     ls{u,x'zj)dN(u,x),  j  <  /  ).  (B.43) 

Define  the  mapping  from  Mp{E)  to  R'  (for  Ei  =  Ei,E'2,orEz) 


Ti:N>-^r/     h{u,xZj)d:Si{u,x),j<l\ 


(a)  Consider  Ei.  The  map  (u,x)  i->  ls{u,x'zj)  is  in  Ck{E\),  since  it  is  uniformly  contin- 
uous on  E\  by  construction  and  vanishes  outside  the  compact  subset  K  in  Ei: 

K  =  [— oo,max(K;,  —6)]    x    X,  where  k,  =  max  x  z. 

xsx,.e{z,,...,z,} 

K  is  compact  in  Ei  since  «  <  oo  by  BC.2.  Hence  by  construction  N  >->■  Ti  (N)  is  continuous 
from  A4p(Ei)  to  R'.  Thus  N  =>  N  in  Mp{E)  implies  Ti(N)  -^  Ti(N)  by  the  continuous 
mapping  theorem. 

'^Indeed,  in  case  of  E3,  j  >  0,  hence  A  =  0.  In  case  of  £1,  E2,  note  (i)  N(A')  <  00  a.s.  for  all  K 
compact  by  definition  of  N  £  Mp{Ei).  K  =  [—00,  —8\  x  X  is  indeed  a  compact  subset  of  E2  or  E3, 
cf.  remark  B.l,  so  #{i  :  Ji  <  —(5}  <  00  a.s.,  (ii)  points  {J,}  of  N  are  real-valued  by  BC.\.  (i)  and  (ii) 
imply  |A|  <  00  a.s.  by  BC.l. 
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(b)  Consider  E3.  Map  (u,x)  i-^  l{u,x'zj)  is  in  Ck{E^):  it  is  uniformly  continuous  on  £3 
by  construction  and  vanishes  outside  the  compact  subset  K  in  E3: 

A' =  [0,  max(/t,  — 5,  0)1    x    X,  where  «;  =  max  x  z. 

xSX,^S{zi,.-.,z,} 

K  is  compact  in  E3  since  k  <  00  by  BC.2.  Therefore,  by  construction  N  1— >  T3(N)  is 
continuous  from  MpiEs)  to  R'.  Hence  N  =>  N  m  MpiEi)  implies  TaCN)  -^  T3(N). 

(c)  Consider  £2-  We  claim  it  suffices  to  show  the  weak  fidi  convergence  (B.43)  only  for 
points  Zn  =  {z  :  x' z  <  0,  Vi  G  X},  since  Zt  and  Z  belong  to  such  set  w.p.  — )■  1,  as  T  — >  00, 
by  BC.3.  This  claim  is  verified  in  Remark  B.3.  Note  that  map  {u,x)  t~>  ls{u,x' z)  is  in  Ck(E'2) 
if  2  €  Zn  ,  since  it  is  uniformly  continuous  on  E'2  by  construction  and  vanishes  outside  the 
compact  subset  K  in  E'2: 

K  =  [— 00,  max(ft,  —S)]    X    X,  where  k  =  max  x  z. 

xex,=g-{zi,...z|} 

K  is  compact  in  E2  since  k  <  0  and  z  €  Zjw-  Hence  N  i->  T2(N)  is  continuous  from  Mp(E'2) 
to  R'.  Hence  N  ^  N  in  Mp{E'2)  implies  TjCN)  -^  T2(N). 

(ii).  To  show  (ii),  pick  distinct  {z\,  ...,zi,l  >  d+1),  so  that  the  convex  hull  Z  of  these  points 
is  non-empty  and  has  positive  Lebesgue  measure  in  R''  ;  for  £21  additionally  require  zt  £  Zn 
for  each  i  (possible  by  compactness  of  X).  Define  Zo  as  the  interior  of  Z.  By  construction 
Zq  is  an  open,  bounded,  non-empty  subset  of  R''.  For  any  z  6  Zo,  (m,  i)  h->  ls{u,x'z)  is  in 
CK{Ei),  by  the  arguments  in  (i),  which  implies  J^  ls{u,x'z)dl^{u,x)  is  finite  a.s.  To  check 
this  note  (a)  ls(u,x'z)  G  CxiEi)  implies  #{i  :  lf,{Jt,X[z)  /  0}  is  finite  a.s.  and  (b)  ls{u,x'z) 
is  bounded  on  EiM 

Remark  B.3  Consider  E2-  We  claimed  it  suffices  to  show  the  weak  fidi  convergence  {B.43) 
only  for  points  Zn  =  {z  :  x' z  <  0,V2:  G  X},  since  Zt  and  Z  necessarily  belong  to  such  set 
w.p.  -^  1  by  BC.3.  Consider  the  objective  functions: 

QT{z,k,e)=    Qrizjk)  +  <f>{supx'z  <  -e), 

lex 

Qoo{z,k,e)  =  Qoo{z,k)  +  4>{supx'z  <  -e), 

i£X 

where  (f>{A)  =  0,  if  A  is  true,  (f>{A)  =  00  if  not.  They  are  convex  and  l.s.c.  by  construction, 
finite  on  an  open  non-empty  set  by  the  earlier  arguments  in  part  2(ii)  of  the  proof  and  by 
compactness  of  X  (so  that  it  is  possible  to  choose  points  z  s.t.  sup3.gx  ^'^  <  —e).  Hence 
by  the  convexity  lemma  1,  Z^  =  SiTginm^QT{z:,k,e)  — >■  Z^  =  a.Tgin\u.QoD{z,k,t)  by  the 
fidi  weak  convergence  demonstrated  in  the  proof  of  Lemma  1,  part  2  (c);  except  if  Zj  is 
s.t.  (/)(supj.gx  ^'zj  <  — e)  =  00,  QT{zj,k,t)  =  00  — >  Q{zj,k,t)  =  cx3  in  R  trivially.  Next 
choose  t  small  s.t.  the  probability  that  Z^  and  Z"  differ  from  Zt  and  Z,  respectively,  is  as 
asymptotically  small  as  desired  by  BC.3(2).  This  shows  Zt  — >  Z. 

Next  let  Zt(A;)  =   argmin.g„d(5r(z,  fc)    and    Z{k)  =   argmin.g|gdQoo(z,  fc). 

Lemma  3  BC.1-BC.3  imply  Zt  =  {ZT{kj)',j  <  I)'  -U  Z  =  {Z{kj)\  j  <  I  )'. 

Proof:  Zr  G  argmin  ,^f^dxlQT{zl,kl)  +  ...  -t-  QT{zi,ki),  for  z  =  (21,..., z;).  Since  this 
objective  is  a  sum  of  objective  functions  in  Lemma  2,  it  retains  the  properties  of  the  elements 
summed.  Therefore  the  argument  of  Lemma  2  applies.  ■ 
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B.3      Limits  in  Models  1  and  2 

Lemma  4,  Resnick[65],  states  conditions  (B.44)  and  (B.45)  for  convergence  to  a  simple  point 
process  N  in  Mp{E).  Lemma  5  shows  (B.45)  suffices  for  weakly  dependent  data.  Lemma  6 
verifies  (B.45)  and  finds  the  limit  N  in  Models  1  and  2.  Corollary  1  gives  Theorems  1-2. 

Lemma  4  (Resnick[65],  3.22)  Suppose  N  is  a  simple  point  process  in  Mp{E),  T  is  a  basis 
of  relatively  compact  open  sets  s.t.  T  is  closed  under  finite  unions  and  intersections,  and  for 
F  eT,  P(N(aF)  =  0)  =  1.   Then  N  =>  N  in  Mp{E)  if  for  VF  e  T: 

lim   P[N(F)  =  0]  =  P[N(F)  =  0],  (B.44) 

T— )-oo 

lim  FN(F)  =  FN(F)  <  oo.  (B.45) 

T—*oo 

Remark  B.4  In  our  case,  T  consists  of  finite  unions  and  intersections  of  bounded  open 
rectangles  in  Ei,  E2,  E3.  Remark  B.l  gives  the  topology  of  Ei,  E'2,  E3. 

We  impose  the  Meyer[55]  conditions  on  our  "rare"  events 

AJ{F)  =  {weQ:  iaT{Ut-bT),Xt)  e  F}. 

Lemma  5  (Poisson  Limits)  Suppose  that  for  any  F  ^  T,  the  triangular  sequence  of  events 
[[Aj{F),t  <T),T>  1}  is  stationary  and  a -mixing  with  the  mixing  coefficient  Q7-(),  (B.45) 
holds,  and  there  exist  sequences  of  integers  ipn,n  >  1),  {qmn  >  1)  ,  {t„  —  n(pn  +qn),n  >  1): 
as  n  -^  00,  for  some  r  >  0  (a)  n''a(„(g„)  — >■  0,  (b)  gn/pn  — >  0,  pn+i/pn  — >  1,  and  (c)  that 
Ip.  =  Ef=r'(Pn  -  i)PiA''{F)  n  A\l,{F))  =  o(l/n).  Then  in  Mp{E),  N  ^  N,  a  PRM  with 
mean  measure  m  :  m{F)  =  limT->oo  EV^{F). 

Proof.  For  any  F  :  m{F)  >  0,  limr-^tx,  F[N(F)  =  0]  =  P[N(F)  =  0]  =  e~'"'^\  by 
Meyer's  theorem[55].  The  same  also  holds  for  F  :  m(F)  =  0,  since  FN(F)  — >  0  implies 
F(N(F)  =  0)  ->  1  [N(F)  is  integer- valued].  Conclude  by  lemma  4  and  definition  of  PRM.  ■ 

Remsirk  B.5  /p„  —  o(l/n)  prevents  clusters  of  "reire"  events  A^  (F).  It  eliminates  compound 
Poisson  processes  as  limits.  The  Meyer  condition  generalizes  Loynes[53].  Leadbetter  et  al[52] 
offer  generalization,  distributional  mixing,  not  suited  when  we  have  X . 

Lemma  6  (Limit  N  in  Models  1  and  2)  Under  Assumption  2  or  1,  and  dependence  con- 
ditions in  lemma  5,  for  the  canonical  constants  (aT.fcr)  defined  in  terms  of  Fu  in  section  3.6, 
N  =>  N  in  Mp(E,),  a  PRM  with  mean  measure  defined  on  E(E\,E'2,orE3)  as: 

m.{du,dx)  =  K(x)h(du)  x  Fx(dx), 

where  h{u)  =  e"  for  type  1,  h(u)  =  (  —  u)'"  for  type  2,  and  h(u)  =  u°  for  type  3  tails.  Points 
(JijXi)  of  N  have  the  representation 

{J,,X,,i  >  1)  =  (h-'{T,/K{X,)),X„i  >  1), 

where  h~'  is  the  inverse  of  h,  F,  =  £^1  -I-  ...  -1-  £^j,i  >  1  ({£i}  are  i.i.d.  standard  exponential), 
and  {Xi}  are  i.i.d.  r.v.  with  law  Fx ,  independently  distributed  from  {F,}. 

In  view  of  the  form  of  K()  (Lemma  10,  see  assumption  2),  the  points  of  N  are 


(J„X„i>l)  = 


(ln(rO  +  A'/c,      X,)  for  type  1, 

(-r~''°X;c,       X)  for  type  2,       i  >  1  (B.46) 

(ry'^X^c,       X)  for  type  3. 
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Proof  of  Lemma  6:  By  Lemma  4  and  5,  this  reduces  to  verifying  that  the  mean  intensity 
measure  iJN(F)  converges  to  m{F),  limx -E'N(F)  =  rn{F)  for  all  F  in  T,  which  follows  by 
straightforward  calculations.  To  show  the  second  part,  construct  a  PRM  with  the  intensity 
measure  7ti().  At  first,  define  a  canonical  homogeneous  PRM  Ni  with  points  {r,,i  >  1} 
(defined  as  above).  It  has  the  mean  measure  mi{du)  =  du  on  [0,oo),  e.g.  Resnick  [65]. 
Secondly,  by  the  (Composition)  Proposition  3.8  in  [65],  the  composed  PP  N2  with  points 
{Fi,  A", }  is  PRM  with  the  mean  measure 

m2{du,dx)  =  du  X  Fx{dx) 

on  [0,00)  X  X,  because  {A'l}  are  i.i.d.  and  are  independent  of  {F,}  (see  Def  A. 4).  Finally, 
the  PP  N  with  the  transformed  points  {T(Fj,  .f,)},  where 

T:  (u,a;)H->  {hr '^  {u / K {x)) ,  x) , 

is  PRM  with  the  desired  mean  measure  on  £  x  X 

m(dj,dx)  =  m2  oT~^{dj,dx)  —  K{x)h{dj)  x  Fx{dx), 

by  the  Transformation  Proposition  3.7  in  Resnick[65]  (see  Def.  A. 4).  ■ 

Corollary  1  (Theorems  1-2)  Lemma  3-6,  Lemma  10,  and  section  D  verified  the  conditions 
BC.1-4  (El,  E2,  E3  suit  the  tail  types  1-3.)  Hence  we  have  Theorenis  1-2. 

C      Proofs  for  section  6,  intermediate  ranks 
C.l      Basic  conditions  for  a  normal  limit 

Define  the  following  key  variables: 

Wr{l)  =  ^L  V  {It  -  l(y,  <  Al/3(r/)))  Xt, 

Gt{1,z)  =  -^y^it{l,z)  [-X'tz  +  (Yt  -  Xtl3{Tl))aT{l)]  , 

where  ^,(/,z)  =  {\(Yt  <  X,'/3(rZ))  -  l{Yt  <  X[l5{rl)  +  X[z/aT{l))). 

Condition  2  There  exists  a  sequence  {aril)}  such  that    as  tT  -^  oo,r  \  0 
BC*1    (Analytical  Tail  Property)  limr  EpGt{1,  z)  =  \z' J{l)z,  J{1)  is  invertible  V/  >  0. 

BC*2  (LLN)  \\Gt{1,  z)  -  EpGt{1,  z)\\  ^  0  for  any  fixed  I  and  z. 

BC*3   (CLT)  {Wrih),  ...Wrilm)}  -A  N{0,g)  for  any  finite  collection  0  <  U,i  <  m. 

Remark  C.l  1.  The  (properly  sCeJed)  objective  function  is  Wril)' z-\-Gt{1,z).  The  BC  im- 
ply its  linear-quadratic  normal  limit,  as  in  the  main  text.  2.  The  analytical  condition  BC*1 
is  most  important.  Under  independence  it  implies  BC*2,while  BC*3  holds  by  Lindeberg- 
Feller.  To  check,  write  Gt{z,1)  =  -?^  X!(  ^'(^' 0>  BC*1  requires  Ert{z,l)  ~v'rT-ic.  By 
compactness  of  X  and  binomiality  of  rt  (a  binomial  variable  times  a  bounded  variable): 
vca:CiZj^.iTt{z ,1) [ VVt)  =  0{Ert{z,l)/T)  =  0(i/y^)  ->  0.  BC*1  is  purely  analytical.  Section 
C.2  verifies  BC*1  in  Models  1  and  2  under  the  additional  assumption  3.  3.  Many  CLT/LLN 
carefully  imply  BC*1  and  BC*2  (e.g.  [18],  [24],  [32]).  Carefulness  means  CLTs  should  require 
no  more  than  two  asymptotically  bounded  moments  of  Wt{1).  E.g.  Liapunov,  L2  mixingale, 
and  NED  CLTs  don't  apply;  but  Lindeberg  and  Li -mixingale  CLTs  do.  In  Lemma  9  we  adapt 
the  local  CLTs  in  Robinson  [66],  designed  for  kernel  estimators. 
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Lemma  7  (Weak  Convergence  under  Intermediate  Ranks)   BC*1-  BC*3  imply, 
as  rT ->  oo,  r  \  0:  (ot(/,)(/3(/.t)  - /3(/ir)),i  <  m)  -U  N{0,n),  Qij  =  J"' (/,)5.j >/"'('; )■ 
Proof.    Case  of  m  =  1  is  notationally  identical  to  the  proof  stated  after  Theorem  3  in  the 
main  text,  which  required  only  the  conditions  above.  The  (joint  convergence)  proof  for  tti  >  2 
is  very  analogous,  so  the  undue  repetition  is  avoided.     ■ 

C.2     Limits  in  Models  1  and  2 

This  section  verifies  that  the  conditions  BC*1  and  BC*2  hold  in  Model  2  (and  1) 

Lemma  8  (Analytical  Tail  Property)    Under  assumptions  1  or  2  and  3,  BC*1  holds,  with 
J{1)  given  in  Theorems  4  and  3  for  Models  2  and  1,  respectively. 

Proof:       Suppress  I  (I  =  1).    Write  E(Gt{z))  =  eL,  ^7^  +  7=^.  ■^^here  r]t  =  fit{Yt  - 
XtP{T))aT-  Use  Ft  and  ft  to  denote  Fu,{-\Xt)  and  fu,{-\Xt): 

^^  ^  -^E\Ft[Ft-Hr)]  -  Ft[Ft-\r)  +  ^]|  ■  \X'tz\ 
VtT  VtT  ciT 

=  -^£| 1  ■  (z  Xt) 

<f)      1   ._E\MKlil)l\.^,'Xt)'  (C.47) 


VtT  a.T 

_  1       F-\mT)-F-\T)         ,2 

(3)    1  1  |m"^  -  1  I         /„    „/  -r  ^      ■      t 

~  ^^  Tjiv  \\ c ■  ^  XtXtZ,  uniformly  m  t. 

1       H{At)\        -t,        I 

Equality  (1)  is  from  the  definition  of  1/ot  =  o{F~^  {mr)  —  F~^{t))  and  a  Taylor  expansion. 
The  equivalence  (2)  is  by  the  assumed  (Assumption  3)  regular  variation  and  uniform  in  t  tail- 
equivalence:  l//t(F,"'(T))  ~  dF-^iT/K{x))/dT  €  7^_5_l.  Pick  x  =  ^x  so  that  K{x)  =  1 
for  now.  By  the  definition  of  regular  vaxiation,  locally  uniformly  in  /  [uniformly  in  I  in  any 
compact  subset  of  (0,  00)] 

fu(F-'{lr))^l^+'f^{F-\r)).  (C.48) 

I.e.  locally  uniformly  in  I 

fu{F:\r)  +  [F-\lT)  -  F-\t)])  ~  l^^' fu{F-\r)).  (C.49) 

Hence  for  any  Zr  — >  1, 

fu{F-\r)  +  [F-'ilrr)  -  F-\t)])  ^  fu{F-\T)). 

Hence  for  any  sequence  Vr  =  o{[F~^{mT)  —  F~^{t)\)  with  0<7n7^1,asr\,0: 

fu{F-\r)  +  v^)^f^[F-\r)), 

because  for  any  such  {vt}  we  can  find  a  sequence  Zt  — >  1  s.t.  {vr}  =  {[Fu^  [Itt)  —  F~*(r)]}. 
Now  because  (a):  i/ftiF^^ir))  ~  dF~^(T/K{x))/dT  uniformly  in  t  by  Assumption  3,  and 
(b):  fu(F-^lT/K))  ~  {l/K)^+^fu{F-\T)),  locally  uniformly  in  I  and  uniformly  in  K  e 
K-x.  =  {K{x)  '■  X  6  X}  [compact  by  assumptions  on  K{-)  and  X],  by  (C.48);  the  equivalence 
(2)  in  (C.47)  now  follows  uniformly  in  t. 
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The  equivalence  (3)  in  (C.47)  follows  from(C.50)-(C.52).  At  first,  by  assumption  /((Ff '  (r)) 
dF-\T/K{x))ldT  =  \/{K{x)fu\F-\T/K{x))]},  uniformly  in  t.  Hence,  uniformly  in  x  =  Xt: 

F-'imr)  -  F-\t)  F" '{mT)-F-\r) 


r{ft[FrHr)])~'  riK(x)f4F-'{r/K{x))])-' 

But  by  Lemma  10(v)  and  Fact  D.l(ii),  uniformly  in  x  €  X: 

F-\mT)-F-\r) 


(C.50) 


F-'(mT/K(x))  -  F-'{t/K{x)) 
where  H{x)  =  1  if  ^  =  0,  H{x)  =  x'c  if  C  /  0.  And 

F^'{mT/K{x))-F-\T/K{x))  _    /""    ^F"' (t/X(x))] 


1/H{x),  (C.51) 


r  MF-'( 

-/i      fu[F-'(. 


ds 


r{K{x)U[F-\r/K{x))])-^  J,      U[F-\stIK{x))] 


«r 


(C.52) 


dls 


where  the  equivalence  (1)  is  by  the  assumed  regular  variation  property  (C.48). 

Finally,  calculations  for  the  term  -7^  are  analogous  to  (C.47),  using  change  of  variables: 

F — p'.YtXt'z,  uniformly  in  i  (C.53) 


y^T       2T     H{Xt)\       -^ 
Combine  (C.53)  and  (C.47)  to  conclude.  ■ 

Lemma  9  (CLT  &  LLN)  Assume  Model  2  and  Assumption  3.  Let  {Yj,Xjy_^  be  a  sta- 
tionary Q-inixing  triangular  sequence,  (i)  If  Oj  =  Oij'"^),  4>  >  2,  and  for  any  K  sufBciently 
close  to  0"*"  or  —00,  uniformly  in  t  and  s  >  1,  tiiere  is  C  >  0,  independent  of  K  s.t.  (Pt  denotes 
P{\:F,),Tt=a{{Y,,X,y_-^)): 

Pt{Ut  <  K,  Ut+s  <K)<  CPtiUt  <  Kf,  (C.54) 

BC*3  holds  with  Q^-,  =  Qx  uimihjj)/ y/U].  If  (C.54)  is  dropped,  BC*3  still  holds  with 
g,j  =  hmr  Ep{l,)p{lj),  p{l)  =  si-  EisilJ'i),  st  =  {l[Yt  <  X',(S{It)]  -  Tl)Xt/yM.  (ii)  If 
a,  =0{j~'^),4>>  Y^,  0  <7  <  1,  andr'"T/r->  0,  then  BC*2  holds. 

Remark  C.2  (C.54)  means  the  extremal  events  should  not  cluster  too  much.  It  may  be 
refined  slightly  along  Watts  et.  aI.[77](no-cov  case).  (C.54)  is  analogous  to  the  local  no- 
clustering  conditions  of  Robinson[66]  (A7.4.,  p. 191)  in  the  context  of  kernel  density  estimation. 

Proof  of  Lemma  9:  (i)  {WTili),i  <  m}  suits  the  CLT  of  Robinson[66).  His  condition  A7.1 
(with  q  =  0),  A7.2,  and  A7.3.  axe  satisfied  automatically.  The  assumed  mixing  condition 
implies  5Z°li  jo:j  <  00,  which  implies  his  condition  A3. 3.  Lastly,  condition  (C.54)  insures 
A7.4.  If  (C.54)  does  not  hold,  apply  theorem  of  M.I.  Gordin  (see  [39],  p. 137).  It  is  easy  to 
check,  using  the  assumed  condition  and  the  classical  Ibragimov  inequality  [18],  that  {st,Tt} 
is  stationary  Lj-mixingale  of  size  -1,  and  F||Wt(0I|2  <  K:  uniformly  in  T,  thus  F||Wt(OIIi  < 
K' .  This  verified  the  conditions  of  the  Gordin  theorem,  (ii)  Suppress  I  in  notation  (irrelevant). 
Var{GT(z))  =  r~'0(Vor(/ii)  -I-  2^jj.^j  Efiifii+k),  for  /i(  defined  earlier.  By  binomiality  of 
Pt  and  the  calculations  analogous  to  those  in  Lemma  8  (denote  by  ||  ■  ||r,p  the  Lr{P)  norm); 

Var(Mt)  =  0(\\MU.P)  =  0{MF-'(T))a-')  =  0(^7JT), 

||(,i,^i  +  .)||:,P  =  0{al--\\p,\\r,p\\pi\]p.p)  =  0(q]-^||^,||].p)  =  O  (al--(^)-/=) 

(for  l/p+  1/r-  =  7  6  (0,  1),  p  >  1),  by  Ibragimov  ineq-ty  ([18]).  So  Var{GT{z))  =  o{l).  ■ 

Corollary  2  (Theorem  3  and  4)  Lemmas  7,  8,  9  yield  theorems  3  and  4. 
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D     Properties  of  Models  1  and  2 

This  section  derives  the  essential  properties  of  Models  1  and  2.  Denote  by  M  any  compact 
subset  of  (0,  co)  that  does  not  contain  1.  Let  T  be  the  set  of  quantile  indices  {t  :  t  =  st' ,  s  £ 
C],  where  C  is  any  fixed  compact  subset  of  (0,  oo),  and  t'  is  the  reference  index  \  0. 

Lemma  10  (Properties)   Assumption  2  and  linearity  [Qy,  (T\Xt  =  x)  =  x  (3{t)]  imply 
(i)  K(x)  is  a  function  of  linear  index  K(x'c)  defined  after  Assumption  2,  section  4. 
(a)  Centering  constants  c{k)  are  those  stated  in  Theorem  2.  (Theorem  1  for  Model  1). 

Uniformly  in  (/,  m,  r,  i)  6  A/  x  71/  x  T  x  X,  as  r'  \  0: 

(Hi)  For  n  =  c_i/(m"^  -  1)  if  ^  7^  0,  ^  =  c-i/ In  m  if  E,  =  0, 

/?i(r)-/?i.  ~  F-'{t),  (D.55) 

/3_i(r)-/3_i.  ~     ^[F->(mr)-F-'(r)],  (D.56) 

also  if^  #  0  ;3_i(r)  -/3_i.  ~  c-iF-'(r); 

^"'^  l,'A0(mT)  -  I3{t))     ^\      (x-,.x)'i^         .fe  =  0,  ^"""^ 

x'{p{mT)-0{r))         j   x'c       if^  ^  0, 

x'(/3(mr)-/3(r))         m-^  -  1 
^'''^  :c'(/3(Zr)-/?(r))    ^    M  -  1  '  ^^-5^' 

We  need  a  few  facts.    Note  that  these  aire  for  low,  not  high,  quantiles.    Write  Fu  €  D(H^), 
if  Fi  is  a  cdf  in  the  domain  of  minimum  attraction  with  the  tail  index  ^  (Def.    3.2).    Write 
Fu  G  7?.-,(0),  if  Fu  is  a  regularly  varying  function  with  exponent  7  at  0. 
Fact  D.l  (On  Regular  Variation)    Uniformly  in  (m,  I,  r)  e  A/  x  A/  x  T,  as  r'  \  0, 

(i)  If  Fi{z)  ~  F2{z)  as  z\  0  or  -00  and  Fi  e  Z^C/Ze),  tiien;  F2  6  Z)(/fJ,  Fi-'(t)  ~ 
Fj-'M,  F-\mr)  -  F'^ir)  ~  F2-'(mr)  -  F^'ir),  and  F-\F-'  e  7e_5{0). 

fiij  Suppose  Fu{z\x)  ~  ii'(x)Fu(2)  as  2  \,  0  or  —00  uniformly  in  x  G  Xfcompact)  as  2  \  0 
or  —00,  and  K{-)  >  0  is  continuous  and  uniformly  bounded  away  from  zero  and  above. 
Then  F~'(r|i)  relates  to  F~^{t/K(x))  as  in  (i)  uniformly  in  i  G  X. 

(Hi)    ^v"J('"")--fV'(^)  ^  nLllz_l  if  F„  G  D(//.). 

fivj    —^ —  "' _, — - — ^-^  — >  In  771  if  Fu  G  D{Ho),  where  £  is  auxiliary  function  in  section  3.6. 

Except  for  (ii),  (i)-(iv)  are  found  in  the  texts  on  extreme  values  -  [19],  also  [65j,  [52],  [26].  (ii) 
holds  from  (i)  pointwise,  and  uniformly  -  by  linearity  of  F^,^  {t\x)  in  x  and  compactness  ofX.. 

Proof  of  Lemma  10.  Here  'locally  uniformly'  means  'uniformly  in  [l,  m,  r,  x)  G  M  x  M  x 

T  X  X,  as  r'  \  0'.  Recall  also  (i  —  fJ.x)i  =  0  by  assumption  (.V  includes  the  intercept). 
First,  note  /^'x(/3{t)  —  Pr)  =  0i{t)  —  /3ir  =  F~^ (rlfix)  ~  F~^(t)  by  assumption.  And 

(x-/.x)'(/?(r)-/3.)_  ,  F-'{t\x)-F-'(t) 

~  F.-Hr/K(x))-F-\r) 

Fu\mT)-F-'{T)  ^  y 

~  ^^'^•^^^j^ —  locally  uniformly        =  B(x), 

m~5  —  1 
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which  follows  from  the  facts  (ii)  and  (iii).  Since  0  <  B(x)  <  oo,  (D.60)  implies 

\(x  -  iix)'A{t)  -  B(x)\  -^  0  locally  uniformly,  (D.61) 

and  that,  since  (x  —  iix)~i  ranges  over  a  non-degenerate  subset  of  R"*"' , 

,4-1  (r)  — >  k(77?,)  locally  uniformly,  (D.62) 

where  K[m)  is  a  constant  vector.  Hence  B{x)  is  affine  in  x  on  X. 

If  ^  =  0,  conclude  B{x)  =  —  lnA'(i)/lnm  =  c{x  —  ixx)/ inrn,  that  is  K{x)  =  e~^  "^ 
on  X  where  n'xc  =  0,  i.e.  ci  =  0.  If  C  /  0,  conclude  B{x)  =  (A'(i)«  -  l)/(m"^  -  1)  = 
c(x  —  fix)/{m.^^  —  1),  that  is  K{x)  =  (x'c)''^  on  X,  where  /i'x-c  =  1,  ci  =  1  {fix  =  (1,  0, ...)). 

By  the  assumption  on  K{),  x'c  is  uniformly  bounded  on  X;  for  types  2  and  3,  x'c  is  also 
uniformly  positive  on  X.  This  shows  claim  (i). 

Claim  (iv)  is  verified  by  substituting  the  forms  of  K{x)  found  into  {D.60). 

Claim  (iii)  follows  directly  from  (D.60),  (D.62),  and  the  preceding  paragraph.  [Note  also: 
F-'(mr)  -  F-\t)  ~  (m"«  -  1)F-\t)  if  C  /  0,  locally  uniformly]. 

Claim  (ii).    If  ^  7^  0,  by  claim  (iii)  uniformly  in  k  in  any  compact  subset  of  (0,  00)  as 

r  -+  00 

arm^)  -  0r)  ~  a.cF-^(|)  =  cF"' (^)/F-' (i)  ~  fc-«c,  (D.63) 

since  by  the  fact  (i)  F~'  G  7^-^(0).  If  ^  =  0,  by  claim  (iii),  facts  (i)  and  (iv),  and  the  definition 
of  vector  c  (ci  =  0),  uniformly  in  k  in  any  compact  subset  of  (0,  00)  as  T  — >  00 

aT(/3(^)-/3r-6Tei)~ 

^(j4^[=(f.-(e|)-F,',i))+e.(f,.,i)-F,.(i,)]  (D-6fl 

— >  c  In  e  +  ei  In  fc  =  c  +  ei  In  fc. 

Claim  (v)  holds  pointwise  in  x  by  facts  (ii)  and  (iii).  Since  the  ratio  on  the  l.h.s.  in  (D.58)  is 
linear  in  x  and  X  is  compact,  it  also  holds  uniformly  in  x  G  X. 
Finally,  combine  fact  (iii)  with  claim  (v)  to  have  claim  (vi).  ■ 

E     Design  for  Extreme  Ranks 

This  design  condition  insures  the  essential  uniqueness  needed  for  Theorems  1  and  2. 
Condition  BC.3*    (Sufficiency  for  Uniqueness  &  Op(l)) 

(a)  For  any  set  X  in  X  s.t.    L  dFx  >  0  and  any  ki  ,  e  >  0,  there  is  K2  large  s.t. 

N([-oo,  Ko]  X  X  n  F)  >  /ti  w.  pr.  >  1  -  e  .  (E.65) 

(b)  Fx  has  an  absolutely  continuous  component  (if  d  >  2),  and  {J,}  are  continuously  dis- 
tributed cond'l  on  {Xi}.  Fx  is  non-degenerate  in  R''.  Normalize  ^x  =  (1,0, ...)'. 

Remark  E.l   Assumptions  (a)  and  (b)  trivially  hold  for  all  of  the  limit  Poisson  RM  obtained 
in  Lemma  6  (for  Tms  1  and  2).  One  continuous  covariate  insures  a.s.  uniqueness. 


Fact  E.l  (About  Nondegeneracy)  Nondegeneracy  of  distribution  function  F  with  support 
5  C  R  means  that  for  some  positive  constants  5  and  K  3  sets  Ri,  i  <  K  that  cover  S  s.t.  for 
any  c  :  ||c||2  >  1,  Bi  :  /^^  dF  >  0  and  x'c  >  S\\c\\2  for  all  x  6  Rt;  cf[60]. 
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Lemma  11  (Essential  Uniqueness  and  Tightness)    Under  BC.3*,  for  /C  =  [Ki,K2] 
(i)  k  ^  K-b  w.p.l.  if  d  >  2,  and  Leh{K.b)  =  0,  if  d  =  1  (no-covaiiate  case), 

Oi)sup,^^\\zim  =  op-{i). 

Proof  of  (i)  The  assumption  (i)  and  the  gradient  conditions  for  (B.39)  in  the  Remark  5.1 
in  section  5  [appHcable  given  the  tightness  (ii)]  insure  that  for  any  given  k,  k  £  Kb  w.p.   0,  if 
d  >  2.  When  d  =  1,  ICb  consists  of  integers  {1,  2, ...}  fl  /C.  ■ 
Proof  of  (ii)  Select  z^  £  R'^  s.t. 


sup 

keK 


Qoo{z^,k)  =  -kti'xz^  +  f  l{u,x'z^)d'N{u,x] 

J  E 


Op.{l).  (E.66) 


(E.66)  is  possible  as  shown  in  the  Proof  of  Theorem  2  and  because  k^xx  enters  Qoo  linearly. 
By  the  linearity  and  convexity,  if  zi  and  22  are  s.t.  (E.66)  holds,  (E.66)  also  holds  for  any  23 
in  the  convex  hull  of  21,22. 

Consider  ball  B{M)  with  radius  M,  centered  at  z' ,  and  let  z{k)  =  z'  +  6(k)v{k),  where 
v{k)  is  a  unit  direction  vector  s.t.  ||t'(A;)||2  =  1,  and  5{k)  >  M  for  any  k.  k  and  v{k)  are  not 
fixed.  By  convexity  in  2,  for  all  fc  G  AT 

J^{Qo.{^{k),k)-Q^{zf,k))>Q^iz'{k),k)-Q^{zf,k),  (E.67) 

where  2'  (k)  is  a  point  of  boundary  of  B{M)  on  the  line  connecting  z{k)  and  z.  We  will  prove 
that  for  any  K  and  e  >  0  there  is  M  large  s.t. 

F.(  inf  Qooiz'{k),k)  >  K)  >l-e.  (E.68) 

(E.68),  combined  with  (E.66),  implies  r.h.s.  of  (£.67)  >  C  >  0  w.p.  cirbitrarily  close  to  1  for 
M  large  enough,  which  verifies  claim  (ii)  of  the  lemma. 

Thus  it  remains  to  verify  (E.68).  For  any  direction  v{k),  as  M  — >  00, 

•  (a)  fi'xz'(k)  =  z({k)  +  vi{k)-  M,      1  >  vi{k)  >  0, 

•  (b)  iJ.'xz'{k)  =  z{{k)  +  vi{k)-M,  -1  <t;i(A;)  <  0. 

(Cases  (a)  are  the  worst).  Fix  some  ni  >  1.  In  view  of  (a)  and  (b),  because  vi(k)  <  1,  it 
suffices  to  show  that:  for  any  K  and  e  >  0,  uniformly  in  fc  G  /C 

-KiM  +  /   l(u,x'z'{k))dTSS{u,x)  >  /C  w.  pr.  >  1  -  e  ,  as  M  ->  00,  (E.69) 

Eind  therefore  (E.68).  Hence  it  suffices  to  show  that  uniformly  in  A;  €  ^  for  some  K2  >  ti 

l{u,  x'z'  {k))dN(u,  x)  >  K.2M  —  K3  vf.  pr.  >  1  —  e  ,  as  M  ->  00,  (E.70) 


L 


IE 

where  fta  is  some  constant.  By  Fact  E.l,  for  any  direction  v(k)  :  ||v(A:)||2  =  1,  fc  G  /C,  there  is 
^i(k,v)  =  {i  6  X  :  x' z' (k)  >  K4M}  s.t.  J„dFx  >  0,  K4  >  0,  and  at  most  K  such  sets  %,(k,v) 
correspond  to  all  possible  directions  v{k)  :  \\v(k)\\2  =  1  and  k  €  IC.  Hence  for  any  (z'{k),k) 
as  M  — >  00 

[  l{u,x'z'(k))dN{u,x)>   [  {x'z'(k)-u)+d'N(u,x) 

■Ib  y£n[-oc«5]xXi,i.„)  (E.71) 

>  N(£;  n  [-00,  Ks]  X  X,(fc,„))(K4M  -  K5)  +  . 

By  BC(3*)  (a),  we  can  select  «5  large  enough  s.t.  N(£' PI  [— 00, /ts]  x  Xi)  >  «;2/«4  for  all 
i  <  K    w.  pr.  >  1  —  e  .  Now  let  M  -^  00.  ■ 
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